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(57) Abstract 

Disclosed are novel isolated nucleic acids and substancially pure protein preparations for naturally occurring and synthetic or chimeric 
heparan sulfate D-glucosaminyl 3-O-sulfo-iransferases (3-OSTs). Also disclosed are uses for these genes and proteins, including uses for 
the modification and sequencing of glycosaminoglycans. 
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HEPARAN SULFATE D-GLUCOSAMINYL 3-O-SULFOTRANSFERASES, AND USES 

THEREFOR 

Field of the Invention 

The present invention is related to the field of biochemistry and molecular biology, and in 
particular to the field of enzymology and heparan sulfate biosynthesis. 

Background of the Invention 
5 The serine proteases of the intrinsic blood coagulation cascade are slowly neutralized by 

antithrombin (AT) (reviewed in (1)). This inhibition is secondary to the generation of 1 : 1 
enzyme-AT complexes whose formation is dramatically enhanced by the mast cell product, 
heparin (2). Damus et ai (3) hypothesized that endothelial cell surface heparan sulfate 
proteoglycans (HSPGs) fiinction in a similar fashion to accelerate coagulation enzyme inactivation 

10 by AT, and therefore are responsible for the non-thrombogenic properties of blood vessels. It 
was initially demonstrated that perfiision of the hindlimbs of normal rodents and rodents deficient 
in mast cells with purified thrombin (T) and AT leads to a greatly elevated rate of T-AT complex 
formation and that the enzyme heparitinase as well as the natural heparin antagonist platelet factor 
4 suppress the above acceleration (4, 5). It was subsequently showed that cultured cloned bovine 

15 macrovascular and rodent microvascular endothelial cells synthesize both anticoagulant HSPG 
(HSPG'^') as well as nonanticoagulant HSPG (HSPG^'') (6-8). HSPG"*bear glycosaminoglycan 
(GAG) chains that bind tightly to AT and accelerate T-AT complex generation (6-8). 

The biosynthesis of HSPG**^* requires generation of a core protein, assembly of a linkage 
region of four neutral sugars on specific serine attachment sites of the core protein, elongation of 

20 a GAG backbone composed of alternating iV-acetylglucosamine and glucuronic acid residues, and 
modification of this homogenous copolymer by partial JV-deacetylation with coupled A^-sulfation 
of glucosamine residues, partial epimerization of glucuronic acid to iduronic acid residues, partial 
2-O-sulfation of uronic acid residues, and partial 6-O-sulfation and partial 3-O-sulfation of 
glucosamine residues (reviewed in (9)). This multienzyme pathway generates HSPG'^^^with 

25 regions of defined structure that contain the primary AT binding domain sequence found in 

anticoagulant heparin: uronic acid-->glucosamine (A^-acetyl/iV-sulfate) 6-0-sulfate->glucuronic 
acid— ^►glucosamine sulfate 3-O-sulfate (6-0-sulfate)^iduronic acid 2-0-sulfate-->glucosamine 
AT-sulfate 6-O-sulfate (10-17). These reactions also produce HSPG"***"* with regions of varying 
monosaccharide sequence that lack the primary AT -binding domain. The structure-fiinction 

30 relationships of the AT binding domain have been elucidated with heparin/heparan sulfate 

oligosaccharides in association with fast reaction kinetics and equilibrium binding assays. The 6- 
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O-sulfate group on residue 2 and the 3-O-sulfate group on residue 4 function in a 
thermodynamically linked fashion to supply half of the binding energy for interaction with AT, and 
trigger a conformational event that accelerates neutralization of specific coagulation proteases 
(11, 12), The amino and ester sulfate groups at residues 5 and 6, as well as carboxyl groups at 
5 other sites, provide the other half of the binding energy for interaction with protease inhibitor (10, 
1 1). Furthermore, monosaccharide sequences outside the primary AT binding domain are 
essential in facilitating inhibition of coagulation proteases other than factor Xa (18, 19). 

During the past eight years, several biosynthetic enzymes that generate HSPG'*"^ and 
j^gpQiiuct j^^yg y^^^^ purified. These proteins include an iV-acetylglucosamine/glucuronic acid 

10 copolymerase (20), //-deacetylase/iV-sulfotransferases (NST-1 and NST-2) (21, 22), a glucuronic 
acid/iduronic acid epimerase (23), an iduronic acid/glucuronic acid 2-O-sulfotransferase (2-OST) 
(24), a glucosamine 6-O-sulfotransferase (6-OST) (25) and a glucosamine 3-O-sulfotransferase 
(3-OST) (26, 35). However, the only enzymes that have also been molecularly cloned are two 
structurally and functionally distinct isoforms of //-deacetylase/iV-sulfotransferase (NST-1 firom 

15 liver and NST-2 from mastocytoma) (27-3 1), and the 2-OST and epimerase. The above enzymes 
must function in a coordinated manner to produce the AT binding domain because the abundance 
of this sequence is much greater than predicted firom a random assembly of constituents (32). The 
postulated regulatory mechanism must direct the biosynthetic enzymes to carry out the 
appropriate sequence of epimerization/sulfation reactions to generate the AT binding domain (33, 

20 34). 

Summarv of the Invention 
The present invention depends, in part, upon the identification and molecular cloning of 
novel genes encoding mammalian heparan sulfate D-glucosaminyl 3-O-sulfotransferases (3- 
OSTs). In particular, as disclosed herein, the present invention provides nucleic acid (SEQ ID 

25 NO: 1) and amino acid (SEQ ID NO: 2) sequences for murine 3-OST-l; nucleic acid (SEQ ID 
NO: 3) and amino acid (SEQ ID NO: 4) sequences for human 3-OST-l; nucleic acid (SEQ ID 
NO: 5) and amino acid (SEQ ID NO: 6) sequences for human 3-OST-2; nucleic acid (SEQ ID 
NO: 7) and amino acid (SEQ ID NO: 8) sequences for human 3-OST-3 A; nucleic acid (SEQ ID 
NO: 9) and amino acid (SEQ ID NO: 10) sequences for human 3-OST-3B; and nucleic acid (SEQ 

30 ID NO: 1 1) and amino acid (SEQ ID NO: 12) sequences for human 3-OST-4. In addition, the 
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invention provides amino acid (SEQ ID NO: 15) sequences for a C elegans homologue, 
ce3-OST. 

Thus, in one aspect, the present invention provides isolated nucleic acids encoding at least 
a fiinctional fragment of a 3-OST protein. In preferred embodiments, the nucleic acid encodes a 
5 3-OST protein comprising a mature murine or human 3-OST- 1. In other embodiments, the 

nucleic acid encodes a 3-OST protein selected from 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 
3-OST-4, and ce3-OST. In other preferred embodiments, the nucleic acid encodes a 3-0- 
sulfotransferase domain of a 3-OST protein selected from 3-OST- 1, 3-OST-2, 3-OST-3A, 3- 
OST-3B, 3-OST-4, and ce3-OST. In particular embodiments, the nucleic acid comprises a 

10 nucleotide sequence selected from nucleotide sequences within: (a) SEQ ID NO: 1; (b) SEQ 
ID NO: 3; (c) SEQ ID NO: 5; (d) SEQ ID NO: 7; (e) SEQ ID NO: 9; (f) SEQ ID NO: 11; 
(g) a sequence having at least 60% nucleotide sequence identity with at least one of (a)-(f) and 
encoding a frinctional fragment having sequence-specific HS binding affinity or 3-0- 
sulfotransferase activity; and (h) a sequence differing from a sequence of (a)-(g) only by the 

15 substitution of synonymous codons. In other particular embodiments, the present invention 

provides an isolated nucleic acid encoding a polypeptide selected from: (a) residues 21-52, 260- 
269, 250-276, 53-31 1, or 21-307 of SEQ ID NO: 2; (b) residues 21-48, 256-265, 246-272, 49- 
307, or 21-303 of SEQ ID NO: 4; (c) residues 42-109, 3 13-325, 303-332, or 1 10-367 of SEQ 
ID NO: 6; (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; (e) residues 

20 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; (f) residues 396-408, 386-415, or 
207-456 of SEQ ID NO: 12; (g) residues 240-250, 230-257, 23-291 of SEQ ED NO: 15, (h) a 
sequence having at least 60% amino acid sequence similarity with at least one of (a)-(g) and 
encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 
sulfotransferase activity; and (i) a sequence comprising a chimera of at least two of sequences 

25 (a)-(h). 

In another aspect, the present invention provides isolated nucleic acids comprising at least 
16 consecutive nucleotides of a nucleotide sequence selected from SEQ ID NO: 1, SEQ ID NO: 
3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, and SEQ ID NO: 1 1 . 

In another aspect, the present invention provides for cells and cell lines transformed with 
30 the nucleic acids of the present invention. Thus, the invention provides host cells transformed 

with any of the above-described nucleic acids. The transformed host cells may be bacterial, yeast, 
or insect cells. Preferably, however, the host cells are mammalian cells, including endothelial 
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cells, mast cells, fibroblasts, hybridomas, oocytes, and embryonic stem cells. Examples of 
preferred mammalian cells include COS-7 cells, murine primary cardiac microvascular endothelial 
cells (CME), murine mast cell line C57.1, primary human endothelial cells of umbilical vein 
(HUVEC), F9 embryonal carcinoma cells, rat fat pad endothelial cells (RFPEC), L cells (e.g., 
5 murine LTA ik cells), and cells derived from the transgenic animals of the invention. The 

transformed host cells may also be fetal cells, embryonic stem cells, zygotes, gametes, or germ 
line cells. Transformed embryonic stem cells, zygotes, gametes, and germ line cells, as well as 
other mammalian cells, may be used to produce transgenic animals in which the expression of 3- 
OST genes have been altered (e.g., knock-outs, enhanced expression, ectopic expression). 

10 In another aspect, the present invention provides substantially pure protein preparations 

comprising at least a functional fragment of a 3-OST protein. Thus, in one embodiment, the 
present invention provides a substantially pure protein preparation comprising mature murine 3- 
OST-1 or mature human 3-OST-l . In another embodiment, the 3-OST protein is selected from 
the group consisting of 3-OST-l, 3-OST-2, 3-OST.3A, 3-OST-3B, 3-OST-4, and ce3-OST. In 

15 another embodiment, the fragment comprises a 3-O-suIfotransferase domain of a 3-OST protein 
selected from the group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and 
ce3-OST. In particular embodiments, the present invention provides a substantially pure protein 
preparation in which the 3-OST protein comprises an amino acid sequence selected from: (a) 
SEQ ID NO: 2; (b) SEQ ID NO: 4; (c) SEQ ID NO: 6; (d) SEQ ID NO: 8; (e) SEQ ID NO: 

20 10; (f) SEQ ID NO: 12; (g) SEQ ID NO 15; and (h) a sequence having at least 60% amino 
acid similarity with at least one of (a)-(g) and having sequence-specific HS binding affinity or 3- 
O-sulfotransferase activity. In other particular embodiments, the present invention provides a 
substantially pure protein preparation in which the 3-OST protein comprises an amino acid 
sequence selected from: (a) residues 21-52, 260-269, 250-276, 53-3 1 1, or 21-307 of SEQ ID 

25 NO: 2; (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; (c) 

residues 42-109, 313-325, 303-332, or 110-367 of SEQ ID NO: 6; (d) residues 44-147, 351- 
363, 341-370, or 148-406 of SEQ ID NO: 8; (e) residues 66-132, 336-348, 326-355, or 133- 
390ofSEQIDNO: 10; (f) residues 396-408, 386-415, or 207-456 of SEQ ID NO: 12; (g) 
residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; (h) a sequence havkig at least 60% 

30 amino acid sequence similarity with at least one of (a)-(g) and encoding a fimctional fragment 

having sequence-specific HS binding affinity or 3-O-sulfotransferase activity; and (i) a sequence 
comprising a chimera of at least two of sequences (a)-(h). 
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In another aspect, the present invention provides for antibodies and methods for making 
antibodies which selectively bind with the 3-OST proteins. These antibodies include monoclonal 
and polyclonal antibodies, as well as functional antibody fragments such as F(ab) and Fc. 

In another aspect, the present invention provides for methods for producing the above- 
5 described proteins. Thus, in one set of embodiments, the isolated nucleic acids of the invention 
may be used to transform host cells or create transgenic animals which express the proteins of the 
invention. The proteins may then be substantially purified fi-om the cells or animals by standard 
methods. Alternatively, the isolated nucleic acids of the invention may be used in cell-free in vitro 
translation systems to produce the proteins of the invention. 

10 In another aspect, the present invention provides methods for 3-O-sulfating saccharide 

residues within a preparation of glycosaminoglycan or proteoglycan polysaccharides by contacting 
the preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein in the presence of 
a sulfate donor under conditions which permit sulfation of the residues, and wherein the 3-OST 
protein is selected from 3-OST- 1, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST 

15 proteins, as well as conservative substitution variants and/or chimeras thereof. In particular 

embodiments, the present invention provides methods for 3-O-sulfating saccharide residues wdthin 
a preparation of glycosaminoglycan or proteoglycan polysaccharides in which the polysaccharides 
include a polysaccharide sequence of GlcA— >GlcNS ±6S. These methods comprise contacting 
the GlcA->GlcNS dt6S-containing polysaccharide preparation with a 3-OST-l protein in the 

20 presence of a sulfate donor under conditions which permit the 3-OST-l to convert the 
GlcA-^GlcNS ±6S sequence to GlcA-->GlcNS 3S ±6S. In particular embodiments, the 
GlcA— >GlcNS ±6S sequence comprises a part of an HS*"^ precursor sequence (i.e., IdoA— > 
GlcNAc 6S^GlcA-^GlcNS ±6S-^IdoA 2S-^GlcNS 6S or IdoA-^GlcNS 6S->GlcA->GlcNS 
±6S->IdoA 2S->GlcNS 68) or a part of an HS^*"* precursor sequence (i.e., IdoA->GlcNAc-> 

25 GlcA->GlcNS ±6S->IdoA 2S-^GlcNS 6S; IdoA->GlcNS^GlcA-^GlcNS ±6S^IdoA 
2S-^GlcNS 6S; IdoA->GlcNAc 6S-»GlcA-^GlcNS ±6S->IdoA 2S->GlcNS; or IdoA-^ 
GlcNS 6S->GlcA-^GlcNS ±6S^IdoA 2S->GlcNS). Conversion of the HS^""' precursor pool to 
HS**"* increases the firaction with AT-binding activity and is particularly useful in the production of 
anticoagulant heparan sulfate products. Thus, in another embodiment, the present invention 

30 provides for means of enriching the AT-binding fi-action of a heparan sulfate pool by contacting 
the polysaccharide preparation with 3-OST-l protein in the presence of a sulfate donor under 
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conditions which permit the 3-OST HS**"^ conversion activity. The S-OST-l protein for use in 
these methods is selected from murine 3-OST-l, human 3-OST-l, mature murine 3-OST-l, 
mature human 3-OST-l, a functional fragment of a 3-OST-l having 3-O-sulfotransferase activity, 
a conservative substitution variant of 3-OST- 1 having 3-O-sulfotransferase activity, and a 
5 chimeric 3-OST- 1 having 3-O-sulfotransferase activity. In preferred embodiments, the sulfate 
donor is 3'-phospho-adenosine 5'-phosphosulfate (PAPS). 

Similarly, the present invention provides methods for 3-O-sulfating saccharide residues 
within a preparation of glycosaminoglycan or proteoglycan polysaccharides by contacting the 
preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein in the presence of a 

10 sulfate donor under conditions which permit sulfation of the residues, and wherein the 3-OST 
protein is selected from 3-OST-2, 3-OST-3A, 3-OST-3B, 3.0ST-4, ce3-0ST and conservative 
substitution variants or chimeras thereof In particular embodiments, the present invention 
provides methods for 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides in which the polysaccharides include a polysaccharide sequence 

15 of GlcA 2S-»GlcNS. These methods comprise contacting the GlcA 2S->GlcNS-containing 
polysaccharide preparation with a 3-OST-2 protein in the presence of a sulfate donor under 
conditions which permit the 3-OST-2 protein to convert the GlcA 2S-»GlcNS sequence to GlcA 
2S-)^GlcNS 3S. In particular embodiments, the GlcA 2S->GlcNS sequence comprises a part of a 
GlcNS->GlcA 2S~>GlcNS sequence. In other particular embodiments, the present invention 

20 provides methods for 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides in which the polysaccharides include a polysaccharide sequence 
of IdoA 2S->GlcNS. These methods comprise contacting the IdoA 2S-»GlcNS-containing 
polysaccharide preparation with a 3-OST-3 protein in the presence of a sulfate donor under 
conditions which permit the 3-OST-3 protein to convert the IdoA 2S ->GlcNS sequence to IdoA 

25 2S^GlcNS 3S. In particular embodiments, the IdoA 2S— >GlcNS sequence comprises a part of a 
GlcNS->IdoA 2S->GlcNS sequence. The 3-OST proteins for use in these methods are selected 
from 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST, functional fragments of these 3-OSTs 
having 3-O-sulfotransferase activity, conservative substitution variants of these 3-OSTs having 3- 
O-sulfotransferase activity, and chimeric 3-OSTs having 3-O-sulfotransferase activity. In 

30 preferred embodiments, the sulfate donor is 3'-phospho-adenosine 5'-phosphosulfate (PAPS). 
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In another aspect, the present invention provides methods for partially sequencing 
complex polysaccharides such as heparan sulfates or other glycosaminoglycans (GAGs). In these 
methods, a pool of polysaccharides which includes sequences which may be 3-O-sulfated is 
contacted with a 3-OST protein in the presence of a sulfate donor (e.g., PAPS) under conditions 
5 which permit sulfation by the 3-OST. The treated polysaccharides are then subjected to 
degradation by enzymes which degrade polysaccharides in a sequence-specific manner (e.g., 
polysaccharide lyases; heparinase I, II or III; heparitinase) and the size profile of the resulting 
firagments is determined. An identical pool which has not been treated with 3-OST is similarly 
cleaved by the same enzymes and a size profile determined. Changes in the size profiles indicate 

10 that 3-OST activity has modified the saccharide units so as to prevent (or permit) cleavage at sites 
which previously were (or were not) cleaved. Thus, comparison of the profiles will indicate 
positions at which the target sequences for 3-OST activity are present and provide a partial 
polysaccharide sequence. 

In another embodiment, the sequence of complex polysaccharides such as HS or GAGs 

15 may be partially determined using sequence specific polysaccharide affinity fi^actionation. To this 
end, 3-OST proteins which lack enzymatic function but retain sequence-specific HS or GAG 
binding capacity can be identified or produced (e.g., altering or deleting a portion of the catalytic 
ST domain by site-directed mutagenesis or deletion mutagenesis). These inactive forms will bind 
HS or GAGs in a sequence dependent manner and allow sequence-specific saccharide affinity 

20 fi-actionation fi-om complex mixtures of GAGs. The purified structures may be degraded in a 
step-wise fashion with exolytic, endolytic enzymes and/or nitrous acid, and the resulting 
degradation products can be compared to standard compounds of known structure. This method 
will allow the quantitation and characterization of known structures contained within unknown 
complex polysaccharide samples. 

25 In another embodiment, partial sequence information can be obtained using the 3-OSTs of 

the invention or other heparan sulfate sequence specific binding ligands as protective groups prior 
to treating the HS or GAG with modifying agents that detectably alter the HS or GAG. Usefiil 
protective groups include catalytically inactive enzymes, chimeric enzymes and small molecule 
ligands with identified sequence binding specificities. The protecting group is contacted with the 

30 heparan or other glycosaminoglycans (GAGs), and the resultant complex is treated with one or 
more modifying agents. Useful modifying agents include catalytically active heparan lyases, 
sulfotransferases, N-deacetylases, N-acetyltransferases, epimerases, or chimeric proteins of the 
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invention. In embodiments where multiple protecting groups and/or modifying reagents are used 
in combination, the sample is first contacted with the protective group, then one or more 
modifying reagents may be with contacted with the protected polysaccharide, either 
simultaneously or in turn. The protective group(s) will interfere with the ability of a modifying 
5 agent to interact with, attach to and/or cleave specific GAG sequence motifs. The sample can 
then be analyzed for ligand-specific protection and/or cleavage to elucidate the sequence of the 
original GAG using separation and/or quantitation using methods known in the art. 

In another set of embodiments, the present invention provides isolated nucleic acids 
comprising a genetic regulatory sequences of a 3-OST gene operably joined to a marker gene. 

10 Such regulatory sequences include 5' untranslated regions such as promoter and operator 

sequences. The 5* regulatory sequences of the human 3-OST-4 gene (as well as coding regions) 
are disclosed herein as SEQ ID NO: 16. Such regulatory regions may be used to transform host 
cells, which are useful in methods of identifying compounds capable of modulating the expression 
of the 3-OST gene. Thus, in such methods, a candidate compound is contacted with a host cell 

15 transformed with a marker gene operably joined to the 3-OST regulatory regions, and changes in 
expression of the marker gene are indicative of the ability of the candidate compound to modulate 
3-OST expression. 

In another aspect, the present invention also provides methods for diagnosing individuals 
with disorders involving heparan sulfate biosynthesis comprising assaying such individuals for the 
20 presence of mutations in 3-OST genes/proteins. Such assays include nucleic acid based assays 
(employing the nucleic acids of the present invention), protein based assays (employing the 
antibodies of the present invention), and HS based assays employing the glycosaminoglycan 
sequencing methods of the present invention. 

These and other aspects of the present invention will be apparent to one of ordinary skill 
25 in the art fi-om the following detailed description. 

Brief Description of the Drawings 
Fig. 1 is an aligrmient of the amino acid sequences of murine and human 3-OST- 1 proteins 
showing the high degree of homology. Vertical bars ( | ) between residues indicate identical 
residues. 

30 Fig. 2 is an alignment of the sulfotransferase domains of human NST-1, human NST-2, 

C elegans 3-OST, human 3-OST-4, human 3-OST.3 A, human 3-OST-2, and human 3-OST.l . 
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Fig. 3 is a schematic depiction of the structures of the 3-OST-l, 3-OST-2, 3-OST-3A, 
3-OST-3B and 3-OST-4 proteins. 

Detailed Description of the Invention 

Definitions 

5 In order to more clearly and distinctly point out and describe the subject matter that 

applicants regard as the invention, the following definitions are provided for certain terms used in 
the following written description and the appended claims. 

Isolated nucleic acids . As used herein with respect to nucleic acids derived fi-om naturally- 
occurring sequences, the term "isolated nucleic acid" means a ribonucleic or deoxyribonucleic 

10 acid which comprises a naturally-occurring nucleotide' sequence and which is manipulable by 
standard recombinant DNA techniques, but which is not covalently joined to the nucleotide 
sequences that are immediately contiguous on its 5' and 3' ends in the naturally-occurring genome 
of the organism fi-om which it is derived. As used herein with respect to synthetic nucleic acids, 
the term "isolated nucleic acid" mezms a ribonucleic or deoxyribonucleic acid which comprises a 

15 nucleotide sequence which does not occur in nature and which is manipulable by standard 

recombinant DNA techniques. An isolated nucleic acid is manipulable by standard recombinant 
DNA techniques when it may be used in, for example, amplification by polymerase chain reaction 
(PCR), in vitro translation, ligation to other nucleic acids (e.g., cloning or expression vectors), 
restriction fi-om other nucleic acids (e.g., cloning or expression vectors), transformation of cells, 

20 hybridization screening assays, or the like. The term "isolated nucleic acids" is also intended to 
embrace synthetic oligonucleotides such as peptide nucleic acids (PNAs), nucleotides joined by 
phosphorothioate or other non-phosphodiester linkages, nucleic acids incorporating fijnctionally 
equivalent nucleotide analogs, and the like. 

Transformation As used herein, means any method of introducing exogenous a nucleic 

25 acid into a cell including, but not limited to, transformation, transfection, electroporation, 

microinjection, direct injection of naked nucleic acid, particle-mediated delivery, viral-mediated 
transduction or any other means of delivering a nucleic acid into a host cell which results in 
transient or stable expression of said nucleic acid or integration of said nucleic acid into the 
genome of said host cell or descendant thereof. 

30 Substantially pure . As used herein with respect to protein preparations, the term 

"substantially pure" means a preparation which contains at least 60% (by dry weight) the protein 
of interest, exclusive of the weight of other intentionally included compounds. Preferably the 
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preparation is at least 75%, more preferably at least 90%, and most preferably at least 99%, by 
dry weight the protein of interest, exclusive of the weight of other intentionally included 
compounds. Purity can be measured by any appropriate method, e.g., column chromatography, 
gel electrophoresis, or HPLC analysis. If a preparation intentionally includes two or more 
5 different proteins of the invention, a "substantially pure" preparation means a preparation in which 
the total dry weight of the proteins of the invention is at least 60% of the total dry weight, 
exclusive of the weight of other intentionally included compounds. Preferably, for such 
preparations containing two or more proteins of the invention, the total weight of the proteins of 
the invention be at least 75%, more preferably at least 90%, and most preferably at least 99%, of 

10 the total dry weight of the preparation, exclusive of the weight of other intentionally included 
compounds. Thus, if the proteins of the invention are mixed with one or more other proteins 
(e.g., serum albumin, 6-OST) or compounds (e.g., diluents, detergents, excipients, salts, 
polysaccharides, sugars, lipids) for purposes of administration, stability, storage, and the like, the 
weight of such other proteins or compounds is ignored in the calculation of the purity of the 

15 preparation. 

Similarity . As used herein with respect to amino acid sequences, the "similarity" between 
two sequences means the percentage of amino acid residue positions, after aligning the sequences 
according to standard techniques, at which the two sequences have identical or similar residues. 
In general, "similar" residues include those which are regarded in the art as "conservative 

20 substitutions" (see, e.g., Dayhoflf et al. (1978), Atlas of Protein Sequence and Structure Vol. 5 
(Suppl. 3), pp. 354-352, Natl. Biomed. Res. Found., Washington, D.C.); which fall within the 
groups (a) methionine, leucine, isoleucine and valine, (b) phenylalanine, tyrosine and tryptophan, 
(c) lysine, arginine and histidine, (d) al2mine and glycine, (e) serine and threonine, (f) glutamine 
and asparagine, and (g) glutamate and aspartate; or which are otherwise shown to have no 

25 substantial effect on the biological activity of the protein. Numerical values for similarity were 
determined using the PileUp program. This program performed multiple sequence alignments 
based on methods of Feng and Doolittle (1987) J. Mol Evol 35: 351-360, and Higgins and Sharp 
(1998), CABIOS 5:151-153. Using these methods for each sequence alignment, the gap weight 
was set at 3.0 and the gap length was set at 0.10. Percentages of similarity recited in the 

30 appended claims may be determined by these methods. 

Chimeric protein . As used herein, the term "chimeric protein" means a protein having an 
amino acid sequence which is a positionally conserved combination of the amino acid sequences 

BNSOOCID: <WO 992200SA2J_> 



wo 99/22005 



PCT/US98/22597 



-11 - 

of two or more other proteins. Thus, for a chimera of two or more reference proteins, the amino 
acid sequences of the reference proteins are aligned by standard techniques to identify residues 
which correspond at each position, allowing for relative insertions/deletions as necessary. Then, 
for each amino acid position of the chimeric protein, an amino acid residue is selected from the 

5 residues present at corresponding positions in the two or more reference proteins (allowing for no 
residue in the chimera when deletions are present amongst the reference proteins). The resultant 
chimera has an amino acid sequence which is a combination of the reference amino acid 
sequences, in which the relative position of each residue selected from the reference sequences is 
conserved within the chimera. 

10 Heparan sulfate . As used herein, the term "heparan sulfate" or the abbreviation "HS" 

means a polysaccharide of the form ([->'4-D-GlcAppl or ->4-L-IdoApal] ->4-D-GlcNp[Ac or 
S]al->)n which is modified to a variable extent by sulfation of the 2-(9-position of Glc and Ido 
residues, and the and 3-0- positions of GIcN[Ac or S] residues. Therefore, this definition 
encompasses all glycosaminoglycan compounds referred to as heparan(s), heparan sulfate(s), 

15 heparin(s), heparin sulfate(s), heparitin(s), heparitin sulfate(s), heparanoid(s), heparosan(s). The 
heparan molecules may be pure glycosaminoglycans or can be linked to other molecules including 
other polymers such as proteins, and lipids, or small molecules such as biotin. 

The Heparan Sulfate D-Glucosaminvl 3-O-Sulfotransferases The present invention depends, in 
20 part, upon the identification and molecular cloning of cDNAs encoding manmialian heparan 

sulfate D-glucosaminyl 3-O-sulfotransferases (3-OSTs). These proteins have been designated 3- 
OST-1, 3-OST-2, 3-OST-3A, 3-OST-3B, and 3-OST-4. In addition, a nematode 3-OST from 
C elegam, ce3-OST, has been identified. 

3-OST- Is . Disclosed herein are the isolation and identification of murine and human 3- 
25 OST-1 cDNAs (SEQ ID NO: 1 and SEQ ED NO: 3, respectively). The coding regions of these 
cDNAs extend from, respectively, nucleotide positions 323-1255 of SEQ ID NO: 1 and positions 
1 19-1039 of SEQ ID NO : 3. The protein coding portions of the cDNAs are 85% identical and 
encode proteins of 31 1 and 307 amino acids (SEQ ID NO: 2 and SEQ ID NO: 4, respectively) 
which are 93% similar. The murine and human protein sequences are aligned in Figure 1. Each 
30 protein includes a twenty residue presumptive signal peptide (residues 1-20 of SEQ ID NO: 2 and 
SEQ ID NO: 4) which is cleaved off to form the mature form of these proteins. The mouse 3- 
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OST-1 contains an extra four residues (Ala^'^-Pro^-Gly^^-Pro^^) not found in the human form. 
Each protein has five potential 7V-glycosylation sites (at residues 52-54, 141-143, 196-198, 246- 
248 and 253-255 of SEQ ID NO: 2, and residues 48-50, 137-139, 192-194, 242-244, 249-251 of 
SEQ ID NO: 4). A^-glycosylation of at least some of these sites appears important to 3-OST 

5 protein stability, specificity and/or activity. After the 3-OST-l signal peptide, there is a domain 
rich in the residues S, P, L, A, and G (SPLAG-rich domain) (residues 21-52 of SEQ ID NO: 2 
and residues 21-48 of SEQ ID NO: 4). 3-OST-l and all known NST species possess a 
homologous carboxy terminal sulfotransferase (ST) domain of -260 amino acids (residues 53-3 1 1 
of SEQ ID NO: 2 and residues 49-307 of SEQ ID NO: 4) that exhibits homology to all known 

10 sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Figure 
2 shows a sequence alignment of the ST domains of the sulfotransferases NST-1 (SEQ ID NO: 
13), NST-2 (SEQ ID NO: 14), OST-1, OST-2, OST-3A/B, and OST-4. Within this region is a 
conserved sequence (at residues 260-269 of SEQ ED NO: 2, and 256-265 of SEQ ED NO: 4) 
which is a presumptive cysteine-bridged peptide loop thought to be involved in heparan sulfate 

15 substrate specificity. This cysteine-bridged peptide loop is part of the larger HS-binding domain 
(residues 250-276 of SEQ ID NO: 2 and 246-272 of SEQ ID NO: 4). A conserved lysine residue 
(residue 68 of SEQ ID NO: 2, and 64 of SEQ ID NO: 4) is presumptively catalytic. 

The 3-OST-l proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequence GlcA-^GlcNS ±6S, and convert this polysaccharide sequence to the 

20 sequence to GlcA-»GlcNS 3S ±6S. Of particular importance, the 3-OST-l proteins are useful in 
converting HS*'''^ precursor sequences (i.e., IdoA-^GlcNAc 6S->GlcA->GlcNS ±6S->IdoA 2S-> 
GlcNS 6S; or IdoA-^GlcNS 6S^GlcA-^GlcNS ±6S->IdoA 2S^ GlcNS 6S) to HS^'V The 3- 
OST-1 proteins are highly expressed in endothelial cells, brain and kidney tissues, and to a lesser 
extent in heart, lung, skeletal muscle and placenta. The human 3-OST-l gene has been 

25 syntactically localized to chromosome 4, and more particularly to chromosome segment 4pl5-16. 

3-OST>2s . Also disclosed herem are the isolation and identification of a human 3 -OST-2 
cDNA (SEQ ID NO: 5). The coding region of this cDNA extends from nucleotide positions 73- 
1 173 of SEQ ID NO: 5. The cDNA encodes a protein of 367 amino acids (SEQ ID NO: 6). The 
protein has four potential //-glycosylation sites (at residues 102-104, 193-195, 235-237 and 306- 

30 308 of SEQ ID NO: 6). iV-glycosylation of at least some of these sites appears important to 3- 
OST protein stability, specificity and/or activity. The 3 -OST-2 protein has a putative N-terminal 
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cytoplasmic domain (residues 1-19 of SEQ ID NO: 6), followed by a putative transmembrane 
domain (residues 20-41 of SEQ ID NO: 6), followed by a SPLAG-rich domain (residues 42-109 
of SEQ ID NO: 6). This is followed by the characteristic carboxy terminal ST domain of -260 
amino acids (residues 1 10-367 of SEQ ID NO: 6) that exhibits homology to all known 

5 sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Within 
this region is a conserved sequence (at residues 3 13-325 of SEQ ID NO: 6) which is a 
presumptive cysteine-bridged peptide loop thought to be involved in heparan sulfate substrate 
specificity. This cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 
303-332 of SEQ ID NO: 6). A conserved lysine residue (residue 24 of SEQ ID NO: 6) is 

10 presumptively catalytic. A cDNA of an allelic variant has also been identified, which includes four 
silent nucleotide substitutions (G-^A at bp 804, T-^G at bp 1249, T^C at bp 1350, and C->T at 
bp 1507 of SEQ ID NO: 5) which do not affect the encoded protein. 

The 3-OST-2 proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequences GlcA 2S->GlcNS or GlcNS-^GlcA 2S-^GlcNS, and convert these 

15 polysaccharide sequences to GlcA 2S^GlcNS 3S or GlcNS-^GlcA 2S->GIcNS 3S, respectively. 
The 3-OST-2 proteins are not expressed in endothelial cells, but are highly expressed in brain 
tissues, and to a lesser extent in heart, lung, skeletal muscle and placenta. The human 3-OST-2 
gene has been localized to chromosome 16, and more particularly to chromosome segment 
16pl2.3. 

20 3>OST-3As . Also disclosed herein are the isolation and identification of a human 3-OST- 

3A cDNA (SEQ ID NO: 7). The coding region of this cDNA extends fi-om nucleotide positions 
799-2016 of SEQ ID NO: 7. The cDNA encodes a protein of 406 amino acids (SEQ ID NO: 8), 
The protein has two potential A^-glycosylation sites (at residues 273-275 and 344-346 of SEQ ID 
NO: 8). iV-glycosylation of one or more of these sites appears important to 3-OST protein 

25 stability, specificity and/or activity. The 3-OST-3 A protein has a putative N-terminal cytoplasmic 
domain (residues 1-24 of SEQ ID NO: 8), followed by a putative transmembrane domain 
(residues 25-43 of SEQ ID NO: 8), followed by a SPLAG-rich domain (residues 44-147 of SEQ 
ID NO: 8), This is followed by the characteristic carboxy terminal ST domain of '--260 amino 
acids (residues 148-406 of SEQ ID NO: 8) that exhibits homology to all known sulfotransferases 

30 and which includes the minimal fi-agment necessary for sulfation activity. Within this region is a 
conserved sequence (at residues 351-363 of SEQ ID NO: 8) which is a presumptive cysteine- 
bridged peptide loop thought to be involved in heparan sulfate substrate specificity. This 
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cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 341-370 of SEQ 
ID NO: 8). A conserved lysine residue (residue 162 of SEQ ID NO: 8) is presumptively catal5rtic. 

The 3-OST-3 A proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequences IdoA 2S->GlcNS or GlcNS->IdoA 2S->GlcNS, and convert these 
5 polysaccharide sequences to IdoA 2S->GlcNS 3S or GlcNS-»IdoA 2S— >GlcNS 3S, respectively. 
The 3-OST-3 A proteins are not expressed in endothelial cells, but are highly expressed in kidney, 
placenta and liver tissues, and to a lesser extent in brain, heart, lung, and skeletal muscle. 

3-OST-3BS . Also disclosed herein are the isolation and identification of a human 3-OST- 
3B cDNA (SEQ ED NO: 9). The coding region of this cDNA extends from nucleotide positions 

10 331-1500 of SEQ ID NO: 9. The cDNA encodes a protein of 390 amino acids (SEQ ID NO: 10). 
The protein has two potential iS^-glycosylation sites (at residues 258-260 and 329-33 1 of SEQ ID 
NO: 10). A^-glycosylation of one or more of these sites appears important to 3-OST protein 
stability, specificity and/or activity. The 3-OST-3B protein has a putative N-terminal cytoplasmic 
domain (residues 1-32 of SEQ ID NO: 10), followed by a putative transmembrane domain 

15 (residues 33-65 of SEQ ID NO: 10), followed by a SPLAG-rich domain (residues 66-132 of SEQ 
ID NO: 10). This is followed by the characteristic carboxy terminal ST domain of -260 amino 
acids (residues 133-390 of SEQ ID NO: 10) that exhibits homology to all known sulfotransferases 
and which includes the minimal fragment necessary for sulfation activity. Within this region is a 
conserved sequence (at residues 336-348 of SEQ ID NO: 10) which is a presumptive cysteine- 

20 bridged peptide loop thought to be involved in heparan sulfate substrate specificity. This 

cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 326-355 of SEQ 
ID NO: 10). A conserved lysine residue (residue 147 of SEQ ID NO: 10) is presumptively 
catalytic. 

The 3-OST-3B proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
25 including the sequences IdoA 2S->GlcNS or GlcNS^IdoA 2S^GlcNS, and convert these 

polysaccharide sequences to IdoA 2S->GlcNS 3S or GlcNS-^IdoA 2S->GlcNS 3S, respectively. 
The 3-OST-3 A proteins are not expressed in endothelial cells, but are highly expressed in kidney, 
placenta and liver tissues, and to a lesser extent in brain, heart, lung, and skeletal muscle. 

3-OST-4S . Also disclosed herein are the isolation and identification of a human 3-OST-4 
30 nucleic acid sequence (SEQ ID NO: 11). This sequence represents is a possible or predicted 
heteronuclear RNA species, and is a composite of 5' genomic sequences information and an 
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overlapping partial cDNA. The coding region of this sequence extends from nucleotide positions 
847-2214 of SEQ ID NO: 1 1, and encodes a protein of 456 amino acids (SEQ ID NO: 12). The 
protein has two potential iV-glycosylation sites (at residues 318-320 and 389-391 of SEQ ID NO: 
12). 7V-glycosylation of one or more of these sites appears important to 3-OST protein stability, 
5 specificity and/or activity. The 3-OST-4 includes the characteristic carboxy terminal ST domain 
of --260 residues (residues 207-456 of SEQ ID NO: 12) that exhibits homology to all known 
sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Within 
this region is a conserved sequence (at residues 396-408 of SEQ ID NO: 12) which is a 
presumptive cysteine-bridged peptide loop thought to be positioned near the active site. This 
10 cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 386-415 of SEQ 
ID NO: 12). A conserved lysine residue (residue 207 of SEQ ID NO: 12) is presumptively 
catalytic. 

The 3-OST-4 proteins have sulfotransferase activity, but the sequence specificity of this 
activity has not yet been determined. The 3-OST-4 proteins appear to be expressed at detectable 

15 levels only in the brain. The human 3-OST-4 gene has been localized to chromosome 16, and 
more particularly to chromosome segment 16pl 1. 

C elegans 3-OSTs . Also disclosed herein is the identification of a C elegans homologue 
of the human 3-OSTs, ce3-OST. This protein is disclosed as SEQ ID NO: 15, and includes the 
characteristic carboxy terminal ST domain of -260 residues (residues 23-291 of SEQ ID NO: 15) 

20 that exhibits homology to all known sulfotransferases and which includes the minimal fragment 
necessary for sulfation activity. Within this region is a conserved sequence (at residues 240-250 
of SEQ ID NO: 15) which is a presumptive cysteine-bridged peptide loop thought to be 
positioned near the active site. This cysteine-bridged peptide loop is part of the larger HS-binding 
domain (residues 230-257 of SEQ ID NO: 15). A conserved lysine residue (residue 38 of SEQ 

25 ID NO: 15) is presumptively catalytic. 

The C. elegans 3-OST proteins have sulfotransferase activity, but the sequence specificity 
of this activity has not yet been determined. BLAST and Genefinder analysis of genomic cosmids 
predicts that ce3-OST is an intraluminal resident protein of 291 residues encoded by 4 exons 
(clone F52B10, GBan U41990; residues 26317-26090, 21886-21732, 21682-21395, and 21345- 

30 21140). 



BNSOOCID: <WO_89Z2005A^I_> 



wo 99/22005 



PCT/US98/22597 



-16- 

The homology between the sulfotransferase domain of the ce3-OST and the human 3-OST 
and NST proteins is illustrated in Fig. 2. Based on this sequence alignment, one may also produce 
chimeric proteins between and the C elegans protein and its human homologues. 
Isolated Nucleic Acids 

5 In one aspect, the present invention provides isolated nucleic acids encoding 3-OST 

proteins or functional fragments thereof. In preferred embodiments, the 3-OST proteins are 3- 
OST-1 proteins, 3-OST-2 proteins, 3-OST-3A proteins, 3-OST-3B proteins, 3-OST-4 proteins, 
or ce3-OST proteins. In particularly preferred embodiments, the 3-OST proteins are those 
disclosed as SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, 

10 SEQ ID NO: 12, or SEQ ID NO: 15. As shown in the examples below, the isolated nucleic acids 
encoding all or a portion of one mammalian 3-OST protein may be used to isolate homologues in 
other species by standard techniques known to those of ordinary skill in the art. Thus, the present 
invention also enables isolated nucleic acids encoding the 3-OST proteins of other mammalian 
species including, for example, rats, goats, sheep, cows, pigs, and non-human primates. Similarly, 

15 the isolated nucleic acids disclosed herein may be used to screen additional human or other 

mammalian genetic libraries (e.g., genomic or cDNA libraries) to identify allelic variants of the 
particularly disclosed sequences. Thus, the present invention also enables isolated nucleic acids 
encoding human and other mammalian 3-OST allelic variants. 

In another aspect, the present invention provides isolated nucleic acids encoding functional 

20 fi-agments of 3-OST proteins, 3-OST protein variants in which conservative substitutions have 
been made for certain residues, or encoding chimeric 3-OST proteins in which the sequences of 
two or more 3-OST proteins have been mixed, to produce non-naturally occurring variants which 
retain sequence-specific HS binding affinity and/or 3-O-sulfotransferase activity. The preferred 
amino acid sequences of such variants are described below. 

25 In preferred embodiments, the isolated nucleic acids encoding a mammalian 3-OST or 

functional fragment thereof have at least 60%, preferably at least 70%, and more preferably at 
least 80% nucleotide sequence identity to the coding regions of the mammalian 3-OST sequences 
particularly disclosed herein (SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, 
SEQ ID NO: 9 and SEQ ID NO: 1 1), and encode at least a functional fi-agment having sequence- 

30 specific HS binding affinity and/or 3-O-sulfotransferase activity. Most preferably, the sequences 
have at least 90% or 95% nucleotide sequence identity to the disclosed reference sequences. 
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As will be apparent to one of ordinary skill in the art, the degeneracy of the genetic code 
allows for numerous nucleotide substitutions in a given coding sequence which do not affect the 
amino acid sequence of the encoded protein. Thus, the present invention also provides for 
isolated nucleic acids which differ from any of the above-described sequences only by the 

5 substitution of such synonymous codons. 

The isolated nucleic acids of the present invention may be joined to other nucleic acid 
sequences for use in various applications. Thus, for example, the isolated nucleic acids of the 
invention may be ligated into cloning or expression vectors, as are commonly known in the art 
and as described in the examples below. In addition, the nucleic acids of the invention may be 

10 joined in-frame to sequences encoding another polypeptide so as to form a fusion protein, as is 
conunonly known in the art and as described in the examples below. Thus, in certain 
embodiments, the present invention provides cloning, expression and fusion vectors comprising 
any of the above-described nucleic acids. 

In another aspect, the isolated nucleic acids of the present invention may comprise only a 

15 portion of a nucleotide sequence encoding a complete mammalian 3-OST protein. For example, 
and as described more fiiUy below, the 3-OST-l proteins comprise a signal sequence which is 
removed post-translationally to yield the mature proteins. In some instances (e.g., when 
translating 3-OST-l proteins in vitro), it may be preferable to employ an isolated nucleic acid 
which encodes only the mature protein. In addition, the four C-terminal residues of 3-OST-l are 

20 believed to be involved in localization of the protein within the Golgi apparatus. In some 

instances (e.g., when encoding 3-OST-l proteins for use in vitro), it may be preferable to employ 
an isolated nucleic acid which does not encode these residues, as they will be unnecessary for in 
vitro function. As described above, an approximately 260 residue portion of the 3-OST proteins 
includes the catalytically active region (ST domain) and, therefore, it may be preferable to employ 

25 an isolated nucleic acid which encodes only this functional fragment which retains 3-0- 

sulfotransferase activity. Thus, in certain preferred embodiments, the present invention provides 
isolated nucleic acid sequences encoding mature forms of a mammalian 3-OST-l protein, C- 
terminally truncated forms of the 3-OST proteins, or minimal functional fragments of the 3-OST 
proteins. In addition, as described above, these sequences may also encode conservative 

30 substitution variants or chimeras of 3-OST proteins, and may include synonymous codon 
substitutions. 
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In another aspect, the present invention provides for nucleic acids which comprise a 
sequence of at least 16-18, preferably 18-20 consecutive nucleotides from any one of SEQ ID 
NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9 and SEQ ID NO: 11. 
Such nucleic acid sequences have utility for determining the levels of expression of 3-OST 
5 transcripts in cells or tissues, for identifying tissues in which the 3-OST genes are differentially 
expressed (see above), for encoding peptide fragments which may be used to raise antibodies to 
corresponding regions of the 3-OST proteins, identifying chromosomes beeiring the corresponding 
3-OST sequences (see above), for priming polymerase chain reaction amplification of 3-OST 
sequences (e.g., prior to in vitro translation, see below), and for various other utilities which will 

10 be apparent to those skilled in the art. Particularly preferred sequences for PGR amplification 

include those which are 5' to and/or include the initiation codon, which are 5' to and/or include the 
codons encoding the signal peptide cleavage site, or which are 3* to and/or include the termination 
codon. Sequences useful for encoding peptide fragments include those which are located within 
the coding region. 

15 Cell Lines and Transgenic Animals 

The present invention also provides for cells or cell lines, both prokaryotic and eukaryotic, 
into which have been introduced the nucleic acids of the present invention so as to cause clonal 
propagation of those nucleic acids and/or expression of the proteins or peptides encoded thereby. 
Such cells or cell lines have utility in the propagation and production of the nucleic acids of the 

20 invention, as well as the production of the proteins of the present invention. As used herein, the 
term "transformed cell" is intended to embrace any cell, or the descendant of any cell, into which 
has been introduced any of the nucleic acids of the invention, whether by transformation, 
transfection, transduction, infection, or other means. Methods of producing appropriate vectors, 
transforming cells with those vectors, and identifying transformants are well known in the art and 

25 are only briefly reviewed here (see, for example, Sambrook et al. (1989) Molecular Cloning: A 
Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New 
York). 

Prokaryotic cells useful for producing the transformed cells of the invention include 
members of the bacterial genera Escherichia (e.g., E. coli\ Pseudomonas (e.g., P, aeruginosa)^ 
30 and Bacillus (e.g., B. subtillus, B. stearothermophilus\ as well as many others well known and 
frequently used in the art. Prokaryotic cells are particularly useful for the production of large 
quantities of the proteins or peptides of the invention (e.g., naturally occurring or synthetic 3- 
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OSTs, fragments of the 3-OSTs, fusion proteins of the 3-OSTs). Bacterial cells (e.g., E. coli) 
may be used with a variety of expression vector systems including, for example, plasmids with the 
T7 RNA polymerase/promoter system, bacteriophage X regulatory sequences, or Ml 3 Phage 
regulatory elements. Bacterial hosts may also be transformed with fusion protein vectors which 
5 create, for example. Protein A, lacZ, trpE, maltose-binding protein, poly-His tag, or glutathione- 
s-transferase fusion proteins. All of these, as well as many other prokaryotic expression systems, 
are well known in the art and widely available commercially (e.g., pGEX-27 (Amrad, USA) for 
GST fusions). 

Eukaryotic cells and cell lines useful for producing the transformed cells of the invention 

10 include mammalian cells (e.g., endothelial cells, mast cells, COS cells, CHO cells, fibroblasts, 
hybridomas, oocj^es, embryonic stem cells), insect cells lines (e.g., Drosophila Schneider cells), 
yeast, and fungi. Eukaryotic cells are particularly useful for embodiments in which it is necessary 
that the 3-OST proteins, or functional fragments thereof, be properly post-translationally modified 
(e.g., A^-glycosylated) because Ts^-glycosylation of these proteins appears to be important to their 

15 stability and/or activity. Currently preferred cells are mammalian cells and, in particular, COS-7 
cells, CHO, cells, murine primary cardiac microvascular endothelial cells (CME), murine mast cell 
line C57.1, human primary endothelial cells of umbilical vein (HUVEC), F9 embryonal carcinoma 
cells, rat fat pad endothelial cells (RFPEC), L cells (e.g., murine LTA tU cells), and cells derived 
from the transgenic animals of the invention. 

20 To accomplish expression in eukaryotic cells, a wide variety of vectors have been 

developed and are commercially available which allow inducible (e.g., LacSwitch expression 
vectors, Stratagene, La JoUa, CA) or constitutive (e.g., pcDNA3 vectors, Invitrogen, Chatsworth, 
CA) expression of 3-OST nucleotide sequences under the regulation of an artificial promoter 
element. Such promoter elements are often derived from CMV or SV40 viral genes, although 

25 other strong promoter elements which are active in eukaryotic cells can also be employed to 
induce transcription of 3-OST nucleotide sequences. Typically, these vectors also contain an 
artificial polyadenylation sequence and 3* UTR which can also be derived from exogenous viral 
gene sequences or from other eukaryotic genes. These expression systems are commonly 
available from commercial sources and are typified by vectors such as pcDNA3 and pZeoSV 

30 (Invitrogen, San Diego, CA). As described below, the vector pcDNA3 has been successfully used 
to cause expression of 3-OST- 1 proteins in transfected COS-7 cells. Numerous expression 
vectors are available from commercial sources to allow expression of any desired 3-OST 
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transcript in more or less any desired cell type, either constitutively or after exposure to a certain 
exogenous stimulus (e.g., withdrawal of tetracycline or exposure to IPTG). 

Vectors may be introduced into the recipient or "host" cells by various methods well 
known in the art including, but not limited to, calcium phosphate transfection, strontium 

5 phosphate transfection, DEAE dextran transfection, electrop oration, lipofection, microinjection, 
ballistic insertion on micro-beads, protoplast fusion or, for viral or phage vectors, by infection 
with the recombinant virus or phage. 
Transgenic Animal Models 

The present invention also provides for the production of transgenic non-human 

10 animal models in which wild type, allelic variant, chimeric, or antisense 3-OST sequences are 
expressed, or in which 3-OST sequences have been inactivated or deleted (e.g., "knock-out" 
constructs) or replaced with reporter or marker genes (e.g., "knock-in reporter construct"). The 
3-OST sequences may be conspecific to the transgenic animal (e.g., murine sequences in a 
trsmsgenic mouse) or transpecific to the transgenic animal (e.g. human sequence in a transgenic 

15 mouse). In such a transgenic animal, the transgenic sequences may be expressed inducibly, 

constitutively or ectopically. Expression may be tissue-specific or organism-wide. Engineered 
expression of 3-OST sequences in tissues and cells not normally containing 3-OST gene products 
may cause novel alterations of heparan polysaccharide structure and lead to novel cell or tissue 
phenotypes. Ectopic or altered levels of expression of 3-OST sequences may alter cell, tissue 

20 and/or developmental phenotypes. Transgenic animals are useful as models of thromboembolic 
and other disorders arising from defects in heparan sulfate biosynthesis or metabolism. 
Transgenic animals are also useful for screening compounds for their effects on HS biosynthesis 
mediated by 3-OSTs. Transgenic animals transformed with reporter constmcts may be used to 
measure the transcriptional effects of small molecules, drugs, protein physiological mediators, 

25 carbohydrate effectors, mimetic compounds or physical perturbations on the expression of 3-OST 
loci in vivo. The transgenic animals of the invention, may be used to screen such compounds for 
therapeutic utility. 

Animal species suitable for use in the animal models of the present invention include, 
but are not limited to, rats, mice, hamsters, guinea pigs, rabbits, dogs, cats, goats, sheep, pigs, and 
30 non-human primates (e.g., Rhesus monkeys, chimpanzees). For initial studies, transgenic rodents 
(e.g., mice) are preferred due to their relative ease of maintenance and shorter life spans. 
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Transgenic non-human primates may be preferred for longer term studies due to their greater 
similarity to humans and their higher cognitive abilities. 

Using the a nucleic acid disclosed and otherwise enabled herein, there are now several 
available approaches for the creation of a transgenic animal. Thus, the enabled animal models 

5 include: (1) animals in which sequences encoding at least a functional fragment of a wild type 3- 
OST gene has been recombinantly introduced into the genome of the animal as an additional gene, 
under the regulation of either an exogenous or an endogenous promoter element, and as either a 
minigene (i.e., a genetic construct of the 3-OST with the introns, if any, removed) or a large 
genomic fragment; (2) animals in which sequences encoding at least a functional fragment of a 

10 normal 3-OST gene have been recombinantly substituted for one or both copies of the animal^s 
homologous 3-OST gene by homologous recombination or gene targeting; (3) animals in which 
one or both copies of one of the animal's homologous 3-OST genes have been recombinantly 
"humanized" by the partial substitution of sequences encoding the human homologue by 
homologous recombination or gene targeting; (4) animals in which sequences encoding 3-OST 

15 transcriptional elements linked to a reporter gene have replaced the endogenous 3-OST gene and 
transcriptional elements; (5) "knock-out" animals in which one or both copies of the animal's 3- 
OST sequences have been partially or completely deleted or have been inactivated by the insertion 
or substitution by homologous recombination or gene targeting of exogenous sequences (e.g., 
stop codons,); (6) animals in which additional genes related to the biosynthesis or metabolism of 

20 heparan sulfates have been altered (e.g., a murine transgenic in which all of the genes in the HS 
pathway have been humanized). These and other transgenic animals of the invention are useftil as 
models of thromboembolic and other disorders arising from defects in heparan sulfate biosynthesis 
or metabohsm. These animals are also useful for screening compounds for their effects on HS 
biosynthesis mediated by 3-OSTs. 

25 To produce an animal model (e.g., a transgenic mouse), a wild type or allelic variant 3- 

OST sequence or a wild type or allelic variant of a recombinant nucleic acid encoding at least a 
functional fragment of a 3-OST is preferably inserted into a germ line or stem cell using standard 
techniques of oocyte or embryonic stem cell microinjection, or other form of transformation of 
such cells. Alternatively, other cells from adult organism may be employed. Animals produced by 

30 these or similar processes are referred to as transgenic. Similarly, if it is desired to inactivate or 
replace an endogenous 3-OST sequence, homologous recombination using oocytes, embryonic 
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Stem or other cells may be employed. Animals produced by these or similar processes are referred 
to as "knock-out" (inactivation) or "knock-in" (replacement) models. 

For oocyte injection, one or more copies of the recombinant DNA constructs of the 
present invention may be inserted into the pronucleus of a just-fertilized oocyte. This oocyte is 
5 then reimplanted into a pseudo-pregnant foster mother. The livebom animals are screened for 
integrants using analysis of DNA (e.g., from the tail veins of offspring mice) for the presence of 
the inserted recombinant treinsgene sequences. The transgene may be either a complete genomic 
sequence introduced into a host as a YAC, BAC or other chromosome DNA fragment, a cDNA 
with either the natural promoter or a heterologous promoter, or a minigene containing all of the 

10 coding region and other elements found to be necessary for optimum expression. 

To create a transgene, the target sequence of interest (e.g., wild type or allelic variant 
3-OST sequences) are typically ligated into a cloning site located downstream of some promoter 
element which will regulate the expression of RNA from the sequence. Downstream of the 
coding sequence, there is typically an artificial polyadenylation sequence. An alternative approach 

15 to creating a transgene is to use an exogenous promoter and regulatory sequences to drive 

expression of the transgene. Finally, it is possible to create transgenes using large genomic DNA 
fragments such as YACs which contain the entire desired gene as well as its appropriate 
regulatory sequences. 

Animal models may be created by targeting endogenous 3-OST sequence in order to 

20 alter the endogenous sequence by homologous recombination. These targeting events can have 
the effect of removing endogenous sequence (knock-out) or altering the endogenous sequence to 
create an amino acid change associated with human disease or an otherwise abnormal sequence 
(e.g., a sequence which is more like the human sequence than the original animal sequence) 
(knock-in animal models). A large number of vectors are available to accomplish this and 

25 appropriate sources of genomic DNA for mouse and other animal genomes to be targeted are 

commercially available from companies such as GenomeSystems Inc. (St. Louis, Missouri, USA). 
The typical feature of these targeting vector constructs is that 2 to 4 kb of genomic DNA is 
ligated 5' to a selectable marker (e.g., a bacterial neomycin resistance gene under its own 
promoter element termed a "neomycin cassette"). A second DNA fragment from the gene of 

30 interest is then ligated downstream of the neomycin cassette but upstream of a second selectable 
marker (e.g., thymidine kinase). The DNA fragments are chosen such that mutant sequences can 
be introduced into the germ line of the targeted animal by homologous replacement of the 
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endogenous sequences by either one of the sequences included in the vector. Alternatively, the 
sequences can be chosen to cause deletion of sequences that would normally reside between the 
left and right arms of the vector surrounding the neomycin cassette. The former is known as a 
knock-in, the latter is known as a knock-out. 

5 Retroviral infection of early embryos can also be done to insert the recombinant DNA 

constructs of the invention. In this method, the transgene (e.g., a wild type or allelic variant 3- 
OST sequence) is inserted into a retroviral vector which is used to directly infect embryos (e.g., 
mouse or non-human primate embryos) during the early stages of development to generate 
partially transgenic animals, some of which bear the transgenes in germline cells. 

10 Alternatively, homologous recombination using a population of stem cells allows for 

the screening of the population for successful transformants. Once identified, these can be 
injected into blastocysts, and a proportion of the resulting animals will show germline 
transmission of the transgene. 

Techniques of generating transgenic animals, as well as techniques for homologous 

15 recombination or gene targeting, are now widely accepted and practiced. A laboratory manual on 
the manipulation of the mouse embryo, for example, is available detailing standard laboratory 
techniques for the production of transgenic mice (69). 

Finally, equivalents of transgenic animals, including animals with mutated or inactivated 3- 
OST sequences may be produced using chemical or x-ray mutagenesis of gametes, followed by 

20 fertilization. Using the isolated a nucleic acid disclosed or otherwise enabled herein, one of 

ordinary skill may more rapidly screen the resulting offspring by, for example, direct sequencing, 
SSCP, RFLP, PGR, or hybridization analysis to detect mutants, or Southern blotting to 
demonstrate loss of one allele by dosage. 
Identifying Modulators of 3-OST Expression 

25 In another set of embodiments, the present invention provides isolated nucleic acids 

comprising a genetic regulatory sequences of a 3-OST gene operably joined to a marker gene. 
Such regulatory sequences include 5* untranslated regions such as promoter and operator 
sequences. The 5' regulatory sequences of the human 3-OST-4 gene (as well as coding regions) 
are disclosed herein as SEQ ID NO: 16. Such regulatory regions may be used to transform host 

30 cells, which are useful in methods of identifying compounds capable of modulating the expression 
of the 3-OST gene. Thus, in such methods, a candidate compound is contacted with a host cell 
transformed with a marker gene operably joined to the 3-OST regulatory regions, and changes in 
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expression of the marker gene are indicative of the ability of the candidate compound to modulate 
3-OST expression. Such methods may also be performed using the transgenic animals of the 
invention. 

Substantially Pure Proteins 
5 In one aspect, the present invention provides substantially pure preparations of 3-OST 

proteins. In preferred embodiments, the 3-OST proteins are 3-OST-l, 3-OST-2, 3-OST-3A, 3- 
OST-3B, 3-OST-4 or ce3-OST proteins. In particularly preferred embodiments, the 3-OST 
proteins are those disclosed as SEQ ID NO: 2, SEQ ED NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, 
SEQ ED NO: 10, SEQ ID NO: 12, or SEQ ID NO 15. As shown in the examples below, nucleic 

10 acids encoding all or a portion of one mammalian 3-OST protein may be used to isolate 

homologues in other species by standard techniques known to those of ordinary skill in the art. 
Thus, the present invention also enables substantially pure protein preparations of 3-OST proteins 
of other mammalian species including, for example, rats, goats, sheep, cows, pigs, and non-human 
primates. Similarly, the isolated nucleic acids disclosed herein may be used to screen additional 

15 human or other mammalian genetic libraries (e.g., genomic or cDNA libraries) to identify allelic 
variants of the particularly disclosed sequences. Thus, the present invention also enables 
substantially pure protein preparations of human and other mammalian 3-OST allelic variants. 

In another aspect, the present invention provides 3-OST protein variants in which 
conservative substitutions have been made for certain residues, or chimeric 3-OST proteins in 

20 which the sequences of various 3-OST proteins have been mixed, to produce non-naturally 
occurring variants which retain 3-O-sulfotransferase activity. Conservative substitutions are 
preferably made in those regions of the proteins which are already known to vary amongst the 
human and murine sequences (see Figure 1) or between the 3-OST-l, 3-OST-2, 3-OST-3A, 3- 
OST-3B 3-OST-4, and ce3-OST proteins (see, e.g.. Figure 2). Substitutions are to be avoided in 

25 those areas which have been implicated in catalysis (see above). Chimeric 3-OST proteins may be 
made using the disclosed sequences as reference sequences, and these chimeras may also be 
subjected to conservative substitutions as described above. In addition, based upon the 
homologies of the 3-OST protems to other glucosaminyl sulfotransferases (e.g., 2-OST, NST-1, 
NST-2), one of ordinary skill in the art may produce chimeric 3-OSTs using those proteins as 

30 reference sequences (see, e.g.. Figure 2). 

In preferred embodiments, the 3-OST proteins have at least 60%, , preferably at least 
70%, and more preferably at least 80% amino acid sequence similarity to the mammalian 3-OST 
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sequences particularly disclosed herein, and retain 3-O-suIfotransferase activity. Most preferably, 
the sequences have at least 90% or 95% amino acid sequence similarity to the disclosed reference 
sequences. Such sequences may be routinely produced by those of ordinary skill in the art, and 3- 
O-sulfotransferase activity may be tested by routine methods such as those disclosed herein. 



sequences for use in various applications. Thus, for example, the proteins of the invention may be 
joined to one or more additional polypeptides so as to form a fusion protein, as is commonly 
known in the art and as described in the examples below. The additional polypeptides may be 
joined to the N-terminus, C-terminus or both termini of the 3-OST protein. Such fusion proteins 
10 may be particularly useful if the additional polypeptide sequences are easily identified (e.g., by 
providing an antigenic determinant) or easily purified (e.g., by providing a ligand for affinity 
purification). 

In another aspect, the substantially pure 3-OST proteins of the present invention may 
comprise only a portion or fragment of the amino acid sequence of a complete mammalian 3-OST 

15 protein. For example, as described above, the 3-OST- 1 proteins comprise a twenty amino acid 
signal sequence which is removed post-translationally to yield the mature proteins. In some 
instances (e.g., when employing 3-OST-l proteins in vitro), it may be preferable to employ only 
the mature protein or a minimal firagment retaining 3-O-sulfotransferase activity. In addition, the 
four C-terminal residues of 3-OST-l may be involved in localization of the protein within the 

20 Golgi apparatus. In some instances (e.g., when employing 3-OST-l proteins in vitro), it may be 
preferable to employ a 3-OST-l protein which does not include these residues, as they will be 
unnecessary for in vitro function. As described above, an approximately 260 amino acid portion 
of the 3-OST proteins includes the catalytically active region and, therefore, it may be preferable 
to employ a 3-OST protein which includes only this functional fragment which retains 3-0- 

25 sulfotransferase activity. Thus, in certain preferred embodiments, the present invention provides 
substantially pure 3-OST proteins including mature forms of a mammalian 3-OST-l protein, C- 
terminally truncated forms, or minimal functional firagments thereof In addition, as described 
above, these proteins may also comprise conservative substitution variants or chimeras of 3-OST 
proteins. 

30 In another aspect, the present invention provides for substantially pure protein 

preparations which comprise a sequence of at least 6-12, preferably 10-16, more preferably 16-22 
consecutive amino acid residues fi-om any one of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, 



5 



The substantially pure proteins of the present invention may be joined to other polypeptide 
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SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, and SEQ ID NO: 15. Such polypeptides have 
utility to raise antibodies to corresponding regions of the 3-OST proteins. In particular, an 
analysis of the amino acid sequences of the 3-OST proteins suggests that there are regions which 
will have particular utility in generating antibodies. Thus, in preferred embodiments, the 
5 inventions provides antigenic 3-OST polypeptides selected from the group consisting of (a) 
residues 4-29, 144-152, 208-222, 31-42, 155-181, 72-94, 195-205, 278-293, 113-136, 56-66, 
230-245, 257-263, 301-306, 267-272 and 101-107 of SEQ ID NO: 2; (b) residues 4-22, MO- 
MS, 205-218, 68-90, 191-201, 274-289, 110-133, 51-62, 226-241, 253-259, 151-163, 168-181, 
297-302, 27-34, 97-107 and 263-268 of SEQ ID NO: 4; (c) residues 18-44, 199-207, 114-123, 

10 319-328, 250-275, 238-246, 128-143, 47-59, 83-98, 332-349, 178-186, 289-295, 310-316, 63- 
76, 4-9, 209-218, 170-176 and 300-305 of SEQ ID NO: 6; (d) residues 22-57, 236-256, 166- 
186, 151-161, 138-147, 77-85, 348-354, 87-94, 323-335, 360-366, 284-314, 217-224, 376-383, 
4-20, 130-136, 67-73, 389-395 and 338-343 of SEQ ID NO: 8; (e) residues 221-241, 8-66, 151- 
171, 135-146, 333-339, 308-320, 345-351, 269-299, 202-209, 361-368, 86-100, 71-80, 115-129, 

15 374-380 and 323-328 ofSEQ ID NO: 10; and (f) residues 280-290, 321-364, 371-388, 21 1- 
231, 393-399, 310-316, 421-438, 405-411, 262-268 and 292-301 of SEQ ID NO: 12. Note that 
these polypeptides are listed in decreasing order of preference within in group (a) to (f). 
Preferred antigenic peptide sequences also include residues 218-231, 87-100, 167-180 and 275- 
288 of SEQ ID NO: 2, which have been successfully used to generate antibodies to m3-OST-l . 

20 Thus, in another aspect, the present invention provides for antibodies and methods for 

making antibodies which selectively bind with the 3-OST proteins. These antibodies include 
monoclonal and polyclonal antibodies, as well as functional antibody fragments such as F(ab) and 
Fc. 

The proteins or peptides of the invention may be substantially purified by any of a veuiety 
25 of methods selected on the basis of the properties revealed by their protein sequences. As shown 
in the examples below, and previously described (26), cells naturally expressing 3-OST-l proteins 
secrete the protein when grown in culture, and the proteins may be isolated firom the cell culture 
medium. The 3-OST-2, 3-OST-3A, 3-OST-3B and 3-OST-4 proteins, however, appear to 
include transmembrane domains. Thus, these proteins are not expected to be secreted at high 
30 levels. Because the 3-OSTs are found in the Golgi apparatus and microsomal bodies of cells 

which naturally express them, a fraction of cells including these organelles may be isolated and the 
proteins may be extracted from this fraction by, for example, detergent solubilization. 
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Alternatively the 3-OST proteins, fusion proteins, or fragments thereof, may be purified fi-om cells 
transformed or transfected with expression vectors. For example, insect cells such as Drosophila 
Schneider cells and baculovirus expression systems may be employed with vectors such as 
pPBLUEBAC and pMELBAC (Stratagene, La Jolla, CA); yeast expression systems with vectors 

5 such as pYESHIS Xpress vectors (Invitrogen, San Diego, CA); eukaryotic expression systems 
with vectors such as pcDNA3 (Invitrogen, San Diego, CA), which causes constitutive expression, 
or LacSwitch (Stratagene, La Jolla, CA) which is inducible; or prokaryotic expression systems 
with vectors such as pKK233-3 (Clontech, Palo Alto, CA). In the event that the protein or 
fragment localizes witWn microsomes derived from the Golgi apparatus, endoplasmic reticulum, 

10 or other membrane containing structures of such cells, the protein may be purified fi*om the 

appropriate cell firaction. Alternatively, if the protein does not localize within these structures, or 
aggregates in inclusion bodies within the recombinant cells (e.g., prokaryotic cells), the protein 
may be purified from whole lysed cells or fi-om solubilized inclusion bodies by standard means. 

Purification can be achieved using standard protein purification procedures including, but 

15 not limited to, affinity chromatography, gel-filtration chromatography, ion-exchange 

chromatography, high-performance liquid chromatography (RP-HPLC, ion-exchange HPLC, size- 
exclusion HPLC), high-performance chromatofocusing chromatography, hydrophobic interaction 
chromatography, immunoprecipitation, or immunoaffinity purification. Gel electrophoresis (e.g., 
PAGE, SDS-PAGE) can also be used to isolate a protein or peptide based on its molecular 

20 weight, charge properties and hydrophobicity. 

A 3-OST protein, or a fi-agment thereof, may also be conveniently purified by creating a 
fusion protein including the desired 3-OST sequence fused to another peptide such as an antigenic 
determinant (e.g., from Protein A, see below) or poly-His tag (e.g., QIAexpress vectors, 
QIAGEN Corp., Chatsworth, CA), or a larger protein (e.g., GST using the pGEX-27 vector 

25 (Amrad, USA) or green fluorescent protein using the Green Lantern vector (GEBCO/BRL. 
Gaithersburg, MD). The fusion protein may be expressed and recovered from prokaryotic or 
eukaryotic cells and purified by any standard method based upon the fusion vector sequence. For 
example, the fiision protein may be purified by immunoaffinity or immunoprecipitation with an 
antibody to the non-3-OST portion of the fusion or, in the case of a poly-His tag, by affinity 

30 binding to a nickel colunm. The desired 3-OST protein or fragment can then be further purified 
from the fusion protein by enzymatic cleavage of the fusion protein. Methods for preparing and 
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using such fusion constructs for the purification of proteins are well known in the art and 
numerous kits are now commercially available for this purpose. 

Currently preferred methods for small scale purification of 3-OST-l proteins firom the 
media of LTA cells grown in culture may be found in Liu et al. (26), and methods for purification 
5 of 3-OSTs produced recombinantly in COS-7 cells, CHO cells, murine primary cardiac 
microvascular endothelial cells (CME), murine mast cell line C57.1, and human primary 
endothelial cells of umbilical vein (HUVEC) may be found in the examples below. These methods 
may also be adapted for use with other cell and expression systems to obtain substantially pure 3- 
OST proteins. 

10 • In another aspect, the present invention provides for methods for producing the above- 

described proteins. Thus, in one set of embodiments, the isolated nucleic acids of the invention 
may be used to transform host cells or create transgenic animals. The proteins of the invention 
may then be substantially purified by well known methods including, but not limited to, those 
described in the examples below. Alternatively, the isolated nucleic acids of the invention may be 

15 used in cell-fi^ee in vitro translation systems. Such systems are also well known in the art and 
include, but are not limited to, that described in the examples below. 
Antibodies 

The present invention also provides antibodies and methods of making antibodies, 
which will selectively bind to and, thereby, isolate or identify wild type and/or variant forms of the 

20 3-OST proteins. The antibodies of the invention have utility as laboratory reagents for, inter alia, 
inmiunoaffinity purification of the 3-OSTs, immunoaffinity purification of 3-OST conjugates or 
complexes (e.g., 3-OST-AT, 3-OST-HS), Western blotting to identify cells or tissues expressing 
the 3-OSTs, and inmiunocytochemistry or inununofluorescence techniques to establish the cellular 
or extracellular location of the protein. 

25 The antibodies of the invention may be generated using the entire 3-OST proteins of 

the invention or using any 3-OST epitope which is characteristic of that protein and which 
substantially distinguishes it firom other host proteins. Such epitopes may be identified by 
comparing sequences of amino acid residues firom a 3-OST sequence to computer databases of 
protein sequences from the relevant host. Preferably, the epitopes are chosen so as to be highly 

30 immunogenic and specific. 

In a preferred embodiment, the inmiunogen/epitope is a protein sequence of at least 6- 
12, preferably 10-16, more preferably 16-22 consecutive amino acid residues of the disclosed 
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OST genes. In particular, an analysis of the amino acid sequences of the 3-OST proteins suggests 
that there are regions which will have particular utility in generating antibodies. Thus, in 
preferred embodiments, the inventions provides antigenic 3-OST pol3^eptides. 

3-OST immunogen preparations may be produced from crude extracts (e.g., 

5 microsomal fractions of cells expressing the proteins), from proteins or peptides substantially 
purified from cells which naturally or recombinantly express them or, for small immunogens, by 
chemical peptide synthesis. The 3-OST immunogens may also be in the form of a fiision protein 
in which the non-3-OST region is chosen for its adjuvant properties and/or the ability to either 
and/or facilitate purification. As used herein, a 3-OST immunogen shall be defined as a 

10 preparation including a peptide comprising at least 4-8, and preferably at least 9-15 consecutive 
amino acid residues of the 3-OST proteins or nucleic acids encoding such a peptide coupled with 
transcriptional elements, as disclosed or otherwise enabled herein. Therefore, any 3-OST derived 
polypeptide or protein sequences which are employed to generate antibodies to the 3-OSTs 
should be regarded as 3-OST immunogens. 

15 The antibodies of the invention may be polyclonal or monoclonal, or may be antibody 

fragments, including Fab fragments, F(ab')2, and single chain antibody fragments. In addition, 
after identifying usefiil antibodies by the method of the invention, recombinant antibodies may be 
generated, including any of the antibody fragments listed above, as well as humanized antibodies 
based upon non-human antibodies to the 3-OST proteins. In light of the present disclosures of 3- 

20 OST proteins, as well as the characterization of other 3-OSTs enabled herein, one of ordinary skill 
in the art may produce the above-described antibodies by any of a variety of standard means well 
known in the art. For an overview of antibody techniques, see Antibodv Engineering. 2nd Ed., 
Borrebaek, ed., Oxford University Press, Oxford (1995). 

As a general matter, monoclonal anti-3-OST antibodies may be produced by first 

25 injecting a mouse, rabbit, goat or other suitable animal with a 3-OST immunogen in a suitable 
carrier or diluent. As above, carrier proteins or adjuvants may be utilized and booster injections 
(e.g., bi- or tri-weekly over 8-10 weeks) are recommended. After allowing for development of a 
humoral response, the animals are sacrificed and their spleens are removed and resuspended in, 
for example, phosphate buffered saline (PBS). The spleen cells serve as a source of lymphocytes, 

30 some of which are producing antibody of the appropriate specificity. These cells are then fiised 
with an immortalized cell line (e.g., myeloma), and the products of the fiision are plated into a 
number of tissue culture wells in the presence of a selective agent such as HAT. The wells are 
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serially screened and repiated, each time selecting cells making useful antibody. Typically, several 
screening and replating procedures are carried out until over 90% of the wells contain single 
clones which are positive for antibody production. Monoclonal antibodies produced by such 
clones may be purified by standard methods such as affinity chromatography using Protein A 
5 Sepharose, by ion-exchange chromatography, or by variations and combinations of these 
techniques. 

The antibodies of the invention may be labeled or conjugated with other compounds or 
materials for diagnostic and/or therapeutic uses. For example, they may be coupled to 
radionuclides, fluorescent compounds, or enzymes for imaging or therapy, or to liposomes for the 

10 targeting of compounds contained in the liposomes to a specific tissue location. 
Assays for Drugs Which Affect 3-OST Expression 

In another series of embodiments, the present invention provides assays for identifying 
small molecules or other compounds which are capable of inducing or inhibiting the expression of 
the 3-OST genes and proteins. The assays may be performed in vitro using non-transformed 

15 cells, established cell lines, or the transformed cells of the invention, or in vivo using normal non- 
human animals or the transgenic animal models of the invention. 

In particular, the assays may detect the presence of increased or decreased expression 
of nucleic acids under the transcriptional control of 3-OST promoter and regulatory sequences on 
the basis of increased or decreased mRNA expression (using, e.g., the nucleic acid probes 

20 disclosed and enabled herein), increased or decreased levels of protein products encoded for such 
nucleic acids (using, e.g., the anti-3-OST antibodies disclosed and enabled herein), or increased or 
decreased levels of activity of such a protein (e g,, p-galactosidase or luciferase). 

Thus, for example, one may culture cells known to express a particular 3-OST, or 
recombination modified to express at least a functional fragment or epitope of 3-OST protein 

25 under the transcriptional control of 3-OST promoter and add to the culture medium one or more 
test compounds. Afler allowing a sufficient period of time (e.g., 0-72 hours) for the compound to 
induce or inhibit the expression of the 3-OST, any change in levels of expression from an 
established baseline may be detected using any of the techniques well known in the art. Using the 
nucleic acid probes and /or antibodies disclosed and enabled herein, detection of changes in the 

30 expression of a 3-OST, and thus identification of the compound as an inducer or inhibitor of 3- 
OST egression, requires only routine experimentation. For example, one may assay for 3-OST 
activity by measuring the conversion of HS^*"^ into HS^^ by methods known in the art (70), 



BNSDOCIO: <WO_992200SA2J_> 



wo 99/22005 



PCTAJS98/22597 



-31 - 

In other embodiments, a recombinant assay is employed in which a reporter gene is 
operably joined to 3-OST promoter and regulatory sequences so as to be under the transcriptional 
control of these sequences. The reporter gene may be any gene which encodes a transcriptional 
or transitional product which is readily assayed or which has a readily determinable affect or 

5 phenotype. Preferred reporter genes are those encoding enzymes with readily detectable activity, 
including without limitation P-galactosidase, green fluorescent protein , alkaline phosphatase, or 
luciferase is operably joined to the 5' regulatory regions of a 3-OST gene. The 3-OST regulatory 
regions, may be readily isolated and cloned by one of ordinary skill in the art in light of the present 
disclosure of the coding regions of these genes. The reporter gene and regulatory regions are 

10 joined in-frame (or in each of the three possible reading frames) so that transcription and 

translation of the reporter gene may proceed under the control of the 3-OST regulatory elements. 
The recombinant construct may then be introduced into any appropriate host cell as described 
herein. The transformed cells may be grown in culture and, after establishing the baseline level of 
expression of the reporter gene, test compounds may be added to the medium. The ease of 

15 detection of the expression of the reporter gene provides for a rapid, high through-put assay for 
the identification of inducers and inhibitors of the 3-OST gene. 

Compounds identified by this method will have potential utility in modifying the 
expression of the 3-OST genes in vivo. These compounds may be further tested in the animal 
models disclosed and enabled herein to identify those compounds having the most potent in vivo 

20 effects. 

Methods for Heparan Modification 

In another aspect, the present invention provides methods for 3-O-sulfating saccharide 
residues within a preparation of glycosaminoglycan or proteoglycan polysaccharides in which the 
polysaccharides include a polysaccharide sequence of GicA— >GlcNS ±6S. These methods 

25 comprise contacting the GlcA-^GlcNS ±6S-containing polysaccharide preparation with 3-OST 
protein in the presence of a sulfate donor under conditions which permit the 3-OST to convert the 
GlcA->GlcNS ±6S sequence to GicA->GlcNS 3S ±6S. In particular embodiments, the 
GlcA->GlcNS ±6S sequence comprises a part of an HS**^ precursor sequence (i.e., GlcA->GlcNS 
±6S->IdoA 2S-> GlcNS ±6S or IdoA-^GlcNAc 6S-»GlcA-)^GlcNS ±6S->IdoA 2S-> GlcNS 

30 6S) or a part of an HS^""^ precursor sequence (i.e., IdoA->GlcNS 6S^GlcA-^GlcNS 

±6S-»IdoA 2S-> GlcNS 6S; IdoA->GlcNAc->GlcA->GlcNS ±6S-^IdoA 2S~> GlcNS 6S; 
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IdoA-^GlcNS->GlcA->GlcNS ±6S->IdoA 2S^ GlcNS 6S; IdoA->GIcNAc 
6S->GlcA->GlcNS ±6S->IdoA 2S-> GlcNS or IdoA->GlcNS 6S->GlcA->GlcNS ±6S->IdoA 
2S-> GlcNS). Conversion of the HS**"^ precursor pool to HS"* increases the fraction with AT- 
binding activity and is particularly useful in the production of anticoagulant heparan sulfate 

5 products. Thus, in another embodiment, the present invention provides for means of enriching the 
AT-binding fraction of a heparan sulfate pool by contacting the polysaccharide preparation with 
3-OST protein in the presence of a sulfate donor under conditions which permit the 3-OST HS**"* 
conversion activity. In preferred embodiments, the sulfate donor is 3 -phospho-adenosine 5 - 
phosphosulfate (PAPS). 

10 Methods of Partially Sequencing Complex Polvsaccharides 

In another aspect, the present invention provides methods for partially sequencing 
complex polysaccharides such as heparan sulfates (HS) or other glycosaminoglycans (GAGs), In 
these methods, a pool of polysaccharides which includes sequences which may be 3-O-sulfated is 
contacted with a 3-OST protein in the presence of a sulfate donor (e.g., PAPS) under conditions 

15 which permit sulfation by 3-OST. The treated polysaccharides are then subjected to degradation 
by enzymes which degrade polysaccharides in a sequence-specific manner (e.g., polysaccharide 
lyases; heparinase 1, 11 or III) and the size profile of the resulting fragments is determined. An 
identical pool which has not been treated with 3-OST is similarly cleaved by the same enzymes 
and a size profile determined. Changes in the size profiles indicate that 3-OST activity has 

20 modified the saccharide units so as to prevent (or permit) cleavage at sites which previously were 
(or were not) cleaved. Thus, comparison of the profiles will indicate positions at which the target 
sequences for 3-OST activity are present and provide a partial polysaccharide sequence. 

In another embodiment, the sequence of complex polysaccharides such as HS or GAG 
may be partially determined using sequence specific polysaccharide <iffinity fractionation. To this 

25 end, 3-OST proteins which lack enzymatic function can be identified or produced (e.g., altering or 
deleting a portion of the catalytic ST domain by site-directed or deletion mutagenesis). These 
inactive forms will bind GAGs in a sequence dependent manner. For example, the 3-OST- 1 
protein normally, minimally, binds a GAG sequence containing GlcA-GlcNS ±6S. When the 
active site of this protein is neutralized, the ka of the protein for these sequences will be relatively 

30 unafifected. This reagent will allow sequence-specific saccharide affinity fractionation from 

complex mixtures of GAGs. The purified structures can be degraded in a step-wise fashion with 
exolytic, endolytic enzymes and/or nitrous acid, and the resulting degradation products can be 
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compared to standard compounds of known structure. This method will allow the quantitation 
and characterization of known structures contained within unknown complex polysaccharide 
samples. 

In another embodiment, partial sequence can be obtained using the 3-OSTs of the 

5 invention or other heparan sulfate sequence specific binding ligands as protective groups prior to 
treating the HS or GAG with modifying agents that detectably alter the HS or GAG. Useful 
protective groups include catalytically inactive enzymes, chimeric enzymes and small molecule 
ligands with identified sequence binding specificities. The protecting group is contacted with the 
heparan or other glycosaminoglycans (GAGs), and the resultant complex is treated with one or 

10 more modifying agents. Useful modifying agents include catalytically active heparan lyases, 
sulfotransferases, N-deacetylases, epimerases, or chimeric proteins of the invention. In 
embodiments where multiple protecting groups and/or modifying reagents in are used in 
combination, the sample is first contacted with the protective group, then each modifying reagent 
may be with contacted with the protected polysaccharide, either simultaneously or in turn. The 

15 protective group will interfere with the ability of a chemically modifying agent to interact with, 
attach to and/or cleave specific GAG sequence motifs. The sample can then be analyzed for 
ligand-specific protection and/or cleavage to elucidate the sequence of the original GAG using 
separation and/or quantitation using methods known in the art. 

In some embodiments, as a preliminary step, fiill length heparans and GAG oligomers can 

20 be fractionated over an immobilized affinity ligand immobilized at their reducing ends via 

hydrazide chemistry. The firaction of GAG captured by the inmiobile phase permits a quantitation 
of the mass or total percent of the target sequence (out of total GAG.) Thus, unique heparan or 
other GAG structures may be concentrated and/or specifically eluted for further analysis. 

One useful method for the detection binding is the Biomolecular Interaction Assay or 

25 "BIAcore" system developed by Pharmacia Biosensor and described in the manufacturer's 

protocol (LKB Pharmacia, Sweden). In hght of the present disclosure, one of ordinary skill in the 
art is now enabled to employ this system, or a substantial equivalent, to identify proteins or other 
compounds having sequence-specific HS or GAG binding capacity, or HS or GAGs sequences 
having 3-OST binding capacity. Such systems utilize surface plasmon resonance, an optical 

30 phenomenon that detects changes in refi-active indices. A sample of interest is passed over an 
immobilized ligand (e.g., a 3-OST fusion protein or specific GAG) and binding interactions are 
registered as changes in the refi^active index. 
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Examples 

Cell Lines and Cell Culture 

The clonal L cell line LTA (35, 41), the generation of clone 33, an LTA transfectant that 
5 over-expresses the ryudocan^cAs cDNA (33), a rapidly growing revertant of clone 33, L-SS"^ (26), 
and RFPEC, an immortalized line derived from rat fat-pad endothelial cells (8) have previously 
been described. Primary mouse neonatal endothelial cells from the cardiac microvasculature of 
day 3-5 neonates (CME cells) (from Dr. Jay Edelberg, MIT/Beth Israel Hospital) and COS-7 cells 
(ATCC) were employed. Primary human umbilical cells (HUVEC) were maintain according to 

10 the supplier's (Clonetics Inc.) protocol. Unless otherwise stated, all cell lines were maintained in 
logarithmic growth by subculturing biweekly in Dulbecco's modified Eagle medium (Life 
Technologies, Inc.) containing 10% fetal bovine serum, 100 ng/ml streptomycin, and 100 units/ml 
penicillin at 37 °C under 5% CO2 humidified atmosphere, as previously described (42). 
Exponentially growing cultures were generated by inoculating 54,000 cells/cm^ and incubating for 

15 two days, whereas post-confluent cultures were produced by inoculating 250,000 cells/cm^ and 
allowing growth for 10 days with medium exchanges on days 4, 7, 8, and 9. 
Peptide Purification and Sequencing 

The purification of mouse 3-OST-l from L-33* has been previously described (26) and the 
final step 4 product was concentrated by reverse phase chromatography on a HP 1090 M system 

20 (Hewlett Packard) equipped with a C4 reverse phase HPLC column (250 x 2. 1 mm, 300 A pore 
size, 5 lira particle size) (Vydac, number 214TP52) equilibrated in 1.6% acetonitrile (v/v), 0.1% 
TFA (v/v). After application of sample, the reverse phase matrix was washed with 60% 
acetonitrile, 0. 1% TFA, and bound species were eluted with 78.4% acetonitrile, 0. 1% TFA. 
Samples of 1,5 or 3 ^g, from two independent purifications, were digested with 0.15 or 0.3 ^g, 

25 respectively, of endopeptidase Lys-C (Waco) in a reaction volume of 100 ^il containing 1% 

RTXIOO (Calbiochem), 10% acetonitrile and 100 mM Tris-HCl pH 8.0, at 37 °C for --16 h (43), 
Digestion products were chromatographed on an HP 1090 M system (Hewlett Packard) equipped 
with the above described C4 reverse phase HPLC column equilibrated in 98% Buffer A (0. 1% 
TFA (v/v))/2% Buffer B (80% acetonitrile (v/v)/0.85% TFA (v/v)). After application of digestion 

30 products, the reverse phase matrix was washed with 98% Buffer A/2% Buffer B, and bound 
species were eluted with linear gradients of Buffer B increasing to 37.5% over 60 min, to 75% 
over 30 min, and to 98% over 15 min (44). The eluate was monitored for absorbance at 210 and 
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280 nm, peptide peaks were individually collected and analyzed with a model 477A/120A Protein 
Sequenator (Applied Biosystems). In addition, the NH2-terminal sequence of 1 ^g of 
concentrated 3-OST-l sample was directly determined. 
Isolation of Mouse 3-OST-l Clones 
5 Isolation of Cytoplasmic and Po1v(AV RNA . Cytoplasmic RNA (17.5 mg) was isolated 

from post-confluent cultures of LTA cells (12 flasks of 175 cm^, ~L6 x 10^ cells) by a 
modification of the procedure of Favaloro (45). Monolayers were twice washed with PBS, cells 
were recovered by trypsinization and centrifiigation (1000 for 2 min), and cell pellets were 
washed by resuspension in PBS followed by centrifugation (1300 x for 4 min). Cells were lysed 

10 by vortexing for 30 sec in 12 ml of ice cold 50 mM Tris, pH 7.4, 140 mM NaCl, 5 mM EDTA, 
1 % Triton X-100, 5 mM vanadium ribonucleoside complexes (Life Sciences Technologies), 
samples were incubated on ice for 10 min and then vortexed for 1 min. Nuclei were pelleted by 
centrifugation at 6000 x^ for 10 min, the supernatant was mixed with an equal volume of 200 
mM Tris, pH 7.4, 300 mM NaCl, 2% SDS, 25 mM EDTA, containing 200 jig/ml of proteinase K 

15 (Boehringer Mannheim), and the mixture was incubated at 65 ®C for 2 hr. Samples were 

extracted twice against an equal volume of phenol/chloroform/isoamyl alcohol (25:24: 1), the 
aqueous phase was combined with 0.7 volumes of isopropanol, cytoplasmic RNA was pelleted by 
centrifugation at 3500 xg' for 10 min, and was resuspended in 3.6 ml of 10 mM Tris, pH 7.4, 1 
mM EDTA. Poly(A)* RNA (59 jig) was isolated from 16 mg of cytoplasmic RNA by two 

20 sequential purifications against 100 mg of oligo(dT) cellulose (Life Sciences Technologies, 

#15939-010) according to the manufacturer's specifications except that binding and wash buffers 
contained 0. 1 % SDS and LiCl was substituted for NaCl. The final eluate (1 .5 ml) was extracted 
agmnst 1.5 ml of phenol/chloroform^soamyl alcohol (25:24:1), the aqueous phase was then 
adjusted to 100 mM LiCl and 260 mM NaCl, an equal volume of isopropanol was added, the 

25 mixture was centrifijged at 1 5,000 x for 30 min and the poly(A)* RNA pellet was recovered in 
40 III of diethyl pyrocarbonate treated water. 

PCR Cloning and Generation of a Mouse 3-OST-l Probe . Degenerate PCR primers 1 S, 
2S, 2 A, and 3 A (described in Shworak et al. (1997) J. Biol. Chem. 272, in press) were obtained 
from Bio Synthesis. First strand cDNA was generated in a 50 (xl volume from 5 ^g of LTA 

30 poly(A)'' RNA primed with oligo(dT) using an RT-PCR kit (Stratagene, La JoUa, CA) according 
to the manufacturer's specifications. Touchdown PCR (46, 47) reactions (50 jil) contained 1 \il 
of first strand cDNA, 25 pmol of each primer, 0.25 |il of AmpUTaq Gold (Perkin Elmer), 200 ^iM 
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of each dNTP and 1 x GeneAmp PCR buffer. Two distinct sets of touchdown PGR conditions 
were required to obtain optimal yields of product. For amplification with primers IS and 2A, 
reactions were heated to 95 ""C for 9 min, subjected to 20 cycles of 94 °C for 30 sec, and 68 °C 
for 1 min with a 0.5 °C reduction per cycle, followed by 20 cycles of 94 °C for 30 sec, 58 °C for 
5 30 sec with a 0.5 ""C reduction per cycle, and 75 for 30 sec, then 1 5 cycles of 94 °C for 3 0 sec, 
55 '^C for 10 sec, and ramping to 75 °C over 50 sec. Alternatively, for amplification with primers 
IS and 3 A or primers 2S and 3 A, reactions were heated to 95 °C for 4 min, subjected to 47 cycles 
of 95 for 30 sec, and 69.5 °C for 2 min with 0.2 ""C and 1 sec reductions per cycle, followed 
by 25 cycles of 95 ""C for 30 sec, 60 ''C for 15 sec, and ramping to 75 ""C over 1 min. 

10 Amplification products were purified as the retentate firom centrifiigal ultrafiltration against a 

30,000 molecular weight cutoff membrane (Millipore, # SK1P343JO), then 200 ng of DNA was 
end polished with PJu DNA polymerase and subcloned into pCR-Script Amp SK(+) (Stratagene, 
La JoUa, CA, #2111 88) according to the manufacturer's specifications. A resulting plasmid, 
pNWS182, contained the 1S/3A amplification product of 779 bp which was released by digestion 

15 vnth EcoRl and iSacII, and isolated by low melting point agarose gel electrophoresis. A ^^P- 

labeled primer extension probe was then generated with a random primer labeling kit (Stratagene, 
La JoUa, CA, # 300385) by replacing the random primers with 5 jiM of primer 3A. 

Construction and Screening of an L Cell cDNA Librarv . Using the manufacturer's 
recommended conditions, an oligo(dT)-primed X Zap Express cDNA library (Stratagene, La JoUa, 

20 CA, # 20045 1) was generated from 5 ^g of LTA poly(A)* RNA which had been pretreated with 
methylmercury hydroxide. About 1.5x10^ primary recombinants were plaque amplified by 
infection into E. coli XL 1 -Blue MRF*. From the amplified library, 1.3 x 10^ plaques were 
transferred to Colony/Plaque Screen (Du Pont-New England Nuclear) and screened with the 
above described ^^P-labeled probe specific for 3-OST-l. Hybridizations were performed at 42 ""C 

25 in 1.7 x SSC, 8.3% dextran sulfate, 42% formamide, 0.8% SDS and filters were washed twice 

with 2 X SSC, 1% SDS for 30 min at 65 ''C. Positive clones were plaque purified and then in vivo 
&ccised into pBK-CMV based phagemids by infection with ExAssist helper phage followed by 
transduction of filamentous phage particles into E, coli XLOLR. 
Isolation of Human 3-OST-l cDNA Clones 

30 The National Center for Biotechnology Information data bank of I.M.A.G.E. Consortium 

(LLNL) expressed sequence tag cDNA clones (48) was probed with the deduced mouse 3-OST-l 
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amino acid sequence to reveal three partial length species. I.M.A.G.E. Consortium ClonelD 
220372 (accession numbers H86812 and H86876) was from the retinal library of Soares 
(N2b4HR), whereas clones 301725 (accession numbers N90867 and W16558) and 301726 
(accession numbers N90856 and W16555) were from the fetal lung library of Soares (NbHL19W 
5 ) and were obtained from the TIGR/ATCC Special Collection (ATCC). The EcoKUNotl insert of 
clone 220372 was ^^P labeled by random priming and used to screen 5x10^ plaques from a X 
TriplEx Brain cDNA library (Clontech, Palo Alto, CA), as described above. Positive plaques 
were purified, TriplEx based plasmids were in vivo excised according to the manufacturer's 
protocol, and were sequenced as described below. 
10 Characterization of Mouse and Human 3-OST-l cDNA Clones 

The 5* and 3' regions of all partial and fixll length clones were enzymaticaliy sequenced 
from flanking primer sites of the respective cloning vectors. For full length clones the remaining 
sequence of both strands was obtained with internally priming oligonucleotides. Automated 
fluorescence sequencing was performed with Perkin Elmer Applied Biosystems Models 373 A and 
15 477 DNA sequencers. Each reaction typically yielded 400 to 600 bases of high quality sequence. 
cDNA sequence files were aligned and compiled with the program Sequencher 3 .0 (Gene Codes 
Corp.). All additional manipulations were performed with the University of Wisconsin Genetics 
Computer Group sequence analysis software package. Sequence comparison searches were 
performed on the databases of GenBank, EMBL, DDE J, PDB, SwissProt, PIR, and dbEST. 
20 Expression of 3-OST-l cDNAs 

Construction of Expression Plasmids . The plasmid pCMV-3-OST contains the mouse 3- 
OST-1 cDNA, an EcoRl/Xhol fragment from pNWS228, inserted between the CMV promoter 
and the bovine growth hormone polyadenylation signal of EcdRUXhdl digested and phosphatase 
treated pcDNA3 (Invitrogen). The plasmid pCMV-ProA3-OST is of similar structure, except the 
25 first 26 amino acid of 3-OST-l are replaced with 291 amino acids encoding a ftision protein of the 
transin leader sequence followed by Protein A and a factor Xa cleavage site, pCMV-ProA-3- 
OST was generated by ligating a BamWUSmal fragment containing the Protein A region from 
pRKSFlOPROTA (49), and animal (end-filled with T4 polymerase)/A7ioI fragment containing 
most of the mouse 3-OST-l cDNA from pNWS228, into BamHUXhol digested and phosphatase 
30 treated pcDNAS (Invitrogen). The in vitro transcription plasmid, pNWS237, contains a T3 

promoter site 5' of the human 3-OST-l cDNA and was constructed by inserting complementary 
oligonucleotides (Bio Synthesis) into the EcdBl site of the TriplEx based plasmid, pJL30. 
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Transient Expression of the Mouse 3-OST-l cDNA in COS-7 Cells . For each expression 
construct, three 175 cm^ flasks were seeded with 3.6 x 10^ COS-7 cells, 6 h later the medium was 
exchanged with DMEM containing 10% Nu-Serum (Life Technologies, Inc.) with 100 jig/mi 
streptomycin and 100 imits/ml penicillin, and cells were grown for an additional day. Monolayers 
5 were washed with PBS then incubated at 37 °C for 2.5 h with 10 ml/flask of freshly prepared 

DMEM containing 235 jig/ml DEAE-dextran (M.W. 500,000, Pharmacia), 9.5 mM Tris-HCl, pH 
7.4, 0.9 mM chloroquine-diphosphate (Sigma), and 3 |ig/ml of the appropriate pcDNA3 based 
expression plasmid. Monolayers were then exposed to freshly prepared 10% DMSO in PBS for 
1.5 min, washed twice ^th nonsupplemented DMEM, fed 30 ml/flask of DMEM containing 10% 

10 fetal bovine serum, 100 ng/ml streptomycin, and 100 units/ml penicillin, and cells were grown for 
an additional day. Monolayers were washed with PBS, then cells were grown in 40 ml/flask 
Serum-Free Medium (DMEM containing 25 mM HEPES, pH 8.0, 1% Nutridoma SP (Boehringer 
Mannheim) (v/v), an additional 2 mM glutamine, 10 ng/ml biotin (Pierce), 100 \ig/nd 
streptomycin, 100 units/ml penicillin, and 1 x of a previously described Trace Metal Mix (26)) for 

15 24 h. COS-cell conditioned Serum-Free Medium was harvested, debris was removed by 

centriftigation at 1,000 x g for 10 min followed by filtration through a 0.45 jim membrane, then 
samples were either immediately processed or were snap frozen with liquid nitrogen and stored at 
-80 °C. Occasionally, conditioned medium from a second incubation of 8-24 h was also collected. 
Purification of W^ld-tvpe and Protein A Tagged Mouse Recombinant 3-OST-L Wild-type 

20 mouse recombinantly expressed 3-OST-l enzyme (r3-OST-l) was purified, at 4 °C, from 240 ml 
of freshly generated Serum-Free Medium conditioned by COS-7 cells transfected with pCMV-3- 
OST. The medium was adjusted to pH 8.0, mixed with an equal volume 2% glycerol, then loaded 
(25 ml/h) onto a heparin-AF Toyopearl-650M column (0.8 x 5.7 cm) (TosoHaas, 
Montgomeryville, PA) equilibrated in 50 mM NaCl, 10 mM Tris-HCl, pH 8.0, 1% glycerol (v/v) 

25 (Buffer C). The column was washed with 20 ml of Buffer C at a flow rate of 0.8 ml/min, then 
with 20 ml of 150 mM NaCl, 10 mM Tris-HCl, pH 8.0, 1% glycerol (v/v) at a flow rate of 0.5 
ml/min, and protein was eluted at a flow rate of 0.25 ml/min with a 20 ml linear NaCl gradient 
extending from 150 mM to 750 mM NaCl in Buffer C. The fractions exhibiting HS**^ conversion 
activity (approximately 4 ml) were pooled, brought to a final concentration of 0.6% CHAPS 

30 (w/v) (Sigma) and dialyzed for 16 h against 4 1 of 25 mM MOPS (3-[iV-morpholino] 

propanesulfonic acid) (Sigma), pH 7.0, 1% glycerol (v/v), 0.6% CHAPS (w/v) (MCG buffer ) 
containing 50 mM NaCL The dialysate was applied to a 3*,5 -ADP-agarose colunm (0.8 x 1.2 cm. 
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3.7 mmol of 3',5 -ADP/ml of gel) (Sigma ) and eluted as previously described (26). The fractions 
containing HS"*"' conversion activity were pooled (approximately 4 ml), aliquoted, frozen in liquid 
nitrogen and stored at -80 **C. 

Protein A tagged mouse r3-OST-l was purified, at 4 **C, from 155 ml of previously fi-ozen 
5 Serum-Free Medium conditioned by COS-7 cells transfected with pCMV-ProA3-OST. IgG 

agarose beads (3 10 fil of a 50/50 slurry; Sigma) were gently stirred with the conditioned medium 
for 3h, recovered by centrifiigation at 2,000 x for 10 min, and washed twice with 1 ml of MCG 
contmning 250 mM NaCl to remove nonspecifically bound protein. Protein A fusion-protein was 
eluted from the beads with two sequential 30 min incubations in 100 ^il of 50 mM sodium acetate, 

10 pH 4.5, 150 mM NaCl, 0.6% CHAPS and 1% glycerol. The pooled eluates were combined with 
an equal volume of 500 mM MOPS, pH 7.0, 0.6% CHAPS, and 1% glycerol, then aliquoted, 
frozen in liquid nitrogen and stored at -80 ®C. 
Retroviral Transduction of CHO and MNE Cells with 3-OST-l 

Plasmid retrovirus vector construction. A retroviral transduction system was used to 

15 transduce CHO cells and mouse neonatal endothelial (MNE) cells. This system may serve as a 
model for in vivo transduction for use in gene therapy. 

The retrovirus backbone plasmid pMSCV-PGK-EGFP is a derivative of pMSCVpac a 
(Dr. Robert Hawley University of Toronto.) The puromycin acetyl transferase gene cassette in 
pMSCVpac was removed and replaced with an Enhanced GFP (Dr. David Baltimore MIT). The 

20 pMSCV-PGK-GFP vector was assembled by digestion of the plasmid with Hindlll and Clal, 
followed by treatment with Klenow fragment. The EGFP cistron 720 bp fragment was derived 
from the digestion of pMSCV-EGFPpac with EcoRI, and blunting with the Klenow fragment. 
The EGFP blunt-ended fragment was then ligated into the blunt-ended pMSCV vector. The 
resulting plasmids were tested for proper orientation by restriction analysis. The reporter virus, 

25 pMSCVPLAP, is designed to express the wild type human placental alkaline phosphatase (PLAP) 
transcribed from the 5* LTR. pMSCV-SEAP-PGK-EGFP was made by cloning the secreted 
alkaline phosphatase (SEAP) BglU and Hpal 1.723 kb fragment from pSEAP2-basic plasmid 
(Clontech, Palo Alto, CA) mto the BgHI and Hpal cut pMSCV-PGKEGFP vector. pCMV3-OST 
was digested with Bgin and Xhol to release the wild type mouse 3-OST-l cDNA. The 1 .623 kb 

30 3-OST.l cDNA fragment was cloned into the Bglll and Xhol sites in pMSCV-PGK-EGFP. The 
occurrence of the insert of interest present in the correct orientation was ascertained by restriction 
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analysis. All plasmid DNA prepared for transfection was made with the Invitrogen SNAP-MIDI 
kits according to the manufacturer's directions. 

Cells and cell culture. Dulbecco's modified Eagle medium (DMEM), F-12 Ham's medium 
and penicillin/streptomycin, 0.25% trypsin, 10 mM EDTA, were obtained fi-om Life Technologies, 
5 Inc., GIBCO-BRL (Gaithersburg, MD). The PHOENIX ecotropic retroviral packaging cell line 
(ATCC #SD 3444) was grown in DMEM, 10% heat-treated fetal bovine serum (FES) (JRH 
Biosciences, Lenexa, KS), 100 units/ml penicillin, 100 jig/ml. PHOENIX cells were subcultured 
three times weekly at a split ratio of approximately 1 :8 in a 37 "^C humidified, 5.0% CO2 
incubator. CHOKl ATCC CCL 61 cells (CHO) were grown in F-12 medium supplemented with 

10 10% fetal bovine serum, and 100 units/ml penicillin, 100 jxg/ml in a 37 ''C humidified, 5.0% CO2 
incubator. CHO cells were subcultured three times weekly at a split ratio of approximately 1 :4 in 
a 37 °C humidified, 5.0% CO2 incubator. 1x10^ CHO cells were transfected with 10 |ig of 
pcB7-ECOTROPIC (generous gift of Dr. Harvey Lodish) by the standard calcium phosphate 
precipitation technique. Plasmid pcB7-ECOTROPIC expresses the MCATl gene (ecotropic 

15 retrovirus receptor cDNA) and hygromycin resistance gene transcribed fi-om separate constitutive 
promoters. The transfected cells were selected for hygromycin resistance in 200 |ig/ml 
hygromycin (Life Technologies). The stable, hygromycin-resistant clones were assayed for their 
ability to-take up and express reporter virus (MSCVPLAP). Fixation and staining for cell-bound 
alkaline phosphatase was performed by standard techniques. CHO clone 4B was chosen because 

20 it transduced most efficiently at the highest dilution tested (i,e., 1: 10,000), and was expanded for 
fiirther analysis. Transduction of CH04B with ecotropic retroviruses is equal to that achievable 
with NIH3T3 cells. Low passage number (passage 2-5), primary mouse neonatal cardiac 
endothelial cells (MNE) were prepared by standard techniques. MNE cells were cultured in a 1 :1 
vol/vol. admbcture of EGM:EGM-2 (CLONETICS) in a 37 °C, humidified, 5.0% CO2 incubator. 

25 MNE cells were subcultured once weekly at a split ratio of approxinaately 1 :3 in a 37 ''C, 
humidified, 5.0% CO2 incubator. 

Northern blot analysis. Total RNA was prepared firom confluent T-80 flasks of each of 
the transduced and untransduced cells using the QIAGEN RNAeasy kit with QIASHREDDER. 
10 ^ig of total cellular RNA was denatured and resolved by electrophoresis in a 1.5% agarose gel, 

30 and then blotted onto GENE-Screen+ (DuPont NEN) with 2X SSPE. The membrane was then 
UV cross-linked using a STRATAlinker. ^^P-radiolabeled cDNA probes were prepared fi-om the 
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fragments of DNA used for cloning the mouse 3-OST-l and SEAP as described above. 
Radiolabeled probes were prepared using 25ng of each template and the Amersham Megaprime 
kit, and a ^^P dCTP from DuPont MEN according to the manufacturer's directions. 
Hybridizations were performed in scalable plastic bags at 68 °C with 1x10^ cpm of probe/ml in 
5 10 ml of QUICKHYB (Stratagene, La Jolla, CA), following the manufacturer's instructions. 

Post-hybridization washes were: once for 15 minutes in IX SSPE, 1.0% SDS at 45 ""C; and then 
twice for 15 minutes each in 0.2X SSPE, 0.5% SDS 650C. After washing, the blots were briefly 
air dried, placed in scalable plastic bags then exposed to Kodak XAR-MS film with intensifying 
screens at -80 for from overnight to five days. Quantitation of hybridizing signal intensity was 

10 performed using a Betascope 603 blot analyzer. Transcripts derived from the 5* LTR of these 
engineered proviruses are large (ca. 7 kb). Since they are large, have multiple sites of 
transcriptional initiation provirus (5' LTR and pgk promoters), and the 3-OST-l construct has 
more than one poly(A) addition signal, bona-fide hybridizable mRNA will appear as different sizes 
in northern blot analysis. The total amount of hybridizing material detected, per sample lane, with 

15 any one probe was used to calculate and compare mRNA expression levels. 

Virion production. Virions were produced by programming ecotropic PHOENIX 
packaging cells with recombinant provirus plasmids using the calcium phosphate transfection 
technique. 10 ^g/well of each recombinant retroviral construct plasmid was transfected via 
calcium precipitation with an overnight incubation period. Following the precipitation step, the 

20 cells were re-fed with 2 ml/well of fresh DMEM and incubated overnight. Each 2 ml of viral 

supernatant was collected and flash-frozen in liquid nitrogen and stored at -80 **C, or used directly 
after a low-speed centrifugation. 

Transduction protocol. Target cells were trypsinized, counted with a Coulter cell counter 
and then plated at 150,000 cells (NIH 3T3/CH04B) or 50,000 cells (MNE) per well of a cluster-6 

25 well plate. 24 hours later, target cells (<70% confluent) were incubated overnight with viral 

supematants containing as adjuvants either 5 ^g/ml polybrene for NIH3T3/CH04B or 25 |xg/ml 
DEAE-dextran (Pharmacia) for MNE. After 12 hours of virus exposure, the growth media was 
replaced. CHO cells destined for FACS sorting were exposed to recombinant retrovirus two 
times at a multiplicity of infection (MOI) of 0.3. MNE cells were transduced one time for 12 

30 hours at an MOI of 0,74 for recombinant 3-OST-l virus and 0.72 for recombinant SEAP virus. 
Transduced cells were allowed to incubate in fresh growth medium for 48 hours prior to FACS to 
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allow for maximum proviral expression. Recombinant virus titers ranged from 1 x 10^ -2 x 10^ 
infectious particles per ml as measured with either NIH3T3 or CH04B cells using FACS analysis 
scoring for EGFP positive cells. Virus titers were reduced approximately eight to ten-fold on 
primary MNE cells relative to N1H3T3. 
Cell-Free Synthesis of Mouse and Human r3-OST-L 

Synthetic capped mouse and human 3-OST-l mRNAs were generated from Notl 
linearized pNWS228 and HinDUl linearized pNWS237, respectively, using T3 polymerase and 
m^G(5')ppp(5')G, as previously described (50). Unlabeled in vitro translation reactions (25 ^1) 
contained 0.25 ^ig of synthetic mRNA, 1.8 \il canine pancreatic microsomal membranes 
(Promega), 0.5 |il each of Amino Acid Mixture Minus Leucine and Amino Acid Mixture Minus 
Methionine, and were performed with nuclease-treated reticulocyte lysate (Promega), according 
to the manufacturer's specifications. 

Measurement of HS''^* Conversion Activity . The HS"* conversion activity, a 3-OST-l 
catalyzed reaction which requires unlabeled PAPS to convert ^^S-HS^^* into ^^S-HS**'^ of crude 
and purified r3-OST-l samples was determined by comparison against a standard curve generated 
with 1 to 32 units of previously purified native 3-OST-l, as previously described (26). The ^^S- 
mginact s^jbstrate was purified from metabolically labeled cell surface HS of exponentially growing 
clone 33 cells, as previously described (35). 
Identification of Enzvmatic Reaction Products 

^^S-labeling of HS by r3-OST-l . ^^S-labeled HS was generated by incubating the various 
forms of r3-OST-l with [^^S]PAPS and unlabeled HS^^ which were prepared as previously 
described (26, 35). Wild-type and Protein A tagged r3-OST-l (2500 units of HS""* conversion 
activity) purified from COS cell conditioned medium, were incubated in a 500 ixl reaction mixture, 
as previously described (26), for 2 h at 37 ^C and ^^S-labeled polysaccharides were purified by 
DEAE-Sepharose chromatography as previously described (26). For cell-free synthesized r3- 
OST-1, ^^S-labeUng of HS was performed in a reticulocj^e lysate based reaction mixture (35) 
except that 100 ^l reactions contained 100 to 300 units of />i vitro translated r3-OST, 180 nM 
unlabeled HS^*^, 5 ^M PAPS (60 xlO^ cpm) and samples were incubated at 37 °C for 2 h. The 
reaction was quenched by the addition of 300 ^1 of 267 mM NaCl, 13.3 ^g/ml glycogen and 
extraction against 600 ^1 of phenol/chloroform/isoamyl alcohol (25:24: 1). ^^S-labeled GAGs 
were ethanol precipitated (35) and then isolated by DEAE chromatography as previously 
described (26). 
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Identification of the Site of Sulfation on HS'^^and HS^^V The DEAE eluates containing 
^^S-labeled polysaccharide were vacuum concentrated to 1/5 volume, then desalted at a flow rate 
of 0.9 ml/mm on TSK G3000 PWxl (0,78 x 30 cm) and TSK G2500 PWxl (0.78 x 30 cm) 
(TosoHaas) columns connected in series equilibrated in 0.1 M ammonium bicarbonate. The 

5 desalted product was then affinity fi*actionated using AT/ConA gel to obtain HS"*"' and HS"^""^ as 
described previously (26). Analysis of labeled products by treatment with GAG lyases and low 
pH nitrous acid were performed as previously described (42). In addition, the HS"**"^ and HS"^""* 
samples were each subjected to hydrazinolysis, high pH nitrous acid (pH 5.5), low pH nitrous acid 
(pH 1.5), and sodium borohydride reduction with the resultant disaccharides characterized on 

10 reverse phase ion pairing HPLC (RPIP-HPLC) as previously reported (33, 34). The identification 
of [^^S]GlcA-»AMN-3-0-S03 and [^^S]GlcA-^AMN-3,6-0-(S03)2 was confirmed by co- 
chromatography on RPIC-HPLC with the appropriate ^H-labeled disaccharide standards, as 
described in prior publications (33,34). 
Northern Blot Analysis 

15 Total RNA fi-om RFPEC and primary mouse CME cells was isolated by the method of 

Chomczynski and Sacchi (51), whereas poly(A)+ RNA was isolated from HUVEC cells as 
described above for LTA cells. Total RNA firom the mast cell line CI.MC/C57. 1 (C57. 1) (52) 
was firom Dr. Stephen J. Galli (Beth Israel Hospital). Samples were resolved on 1 .2% 
formaldehyde-agarose gels and subjected to Northem blot analysis as previously described (50). 

20 Mouse and human samples were hybridized with mouse or human probes, respectively, and 

washed as described for library screening, above, except hybridizations were performed at 60 °C. 
Peptide Sequencing and PGR Generation of a Mouse 3-0-Sulfotransferase>l f3-OST-n Probe 

The information necessary for the molecular cloning of mouse heparan sulfiite D- 
glucosaminyl 3-O-sulfotransferase-l (3-OST-l) was obtained by sequencing the amino terminus 

25 and Lys-C generated peptides of the enzyme that we had previously purified firom large quantities 
of serum-fi-ee tissue culture medium conditioned by an L cell line (26). These studies established 
the structures of 14 partially overlapping peptides which encompass 185 amino acid residues. 
Degenerate PGR primers were synthesized based on the sequence of the amino terminus (primer 
IS) and two endopeptidase derived fi-agments (primers 2S, 2A, and 3 A). When PGR was 

30 performed on an LTA first strand cDNA template, products of about 210 (primers 1 S/2A) and 
780 (primers 1S/3A) and 610 (primers 2S/3A) bp were obtained, which suggests that all of the 
primer sites are contained within a single cDNA. To confirm this supposition, the two largest 
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fragments were cloned into pCR-Script Amp SK(+) and inserts were sequenced, which revealed 
that the 1S/3A product is 779 bp and contains the 611 bp 2S/3A product. The 779 bp insert 
encodes 12 of the sequenced peptide fragments and so was ^^P-labeled, as described above, and 
used as a probe for cDNA library screening. 
5 Isolation and Characterization of Mouse 3-OST-l cDNAs 

An amplified X Zap Express LTA cDNA library of 1 .5 x 10^ primary recombinants was 
constructed and 1.3 x 10^ plaques were screened with the above described probe, which revealed 
40 positives that were plaque purified and in vivo excised into plasmids. The cDNA inserts of 
each plasmid were characterized to eliminate duplicated recombinants due to library amplification. 

10 Size was determined by liberating cDNA inserts with digestion at flanking EcoRl and Xhol 

restriction sites followed by agarose gel electrophoresis; furthermore, the sequence at both ends 
of each insert was obtained from flanking vector primer sites. This analysis revealed 25 unique 
primary recombinants which predominantly contained inserts of approximately 1.7, 2.3, or 3.3 kb. 
These different species were considered to reflect natural size variants of the mouse message since 

15 northern blots of LTA poly(A)^ RNA hybridized with 3-OST-l probe revealed the same three size 
categories of message. The complete sequencing of 9 distinct primary recombinants, at least 2 
from each size category, in conjunction with the partial sequencing of the remaining 16 clones 
showed that the size variants result from differences in the length of 5* untranslated region due to 
the insertion of 0-1629 bp at a single common internal point, the splice variant site. Most 

20 importantly, all clones shared identical protein coding regions and, therefore, the characterization 
and analysis of only the shortest species, the Class 1 cDNA, which lacks additional sequence at 
the splice variant site, is described below. 

Sequence data was obtained from 2 essentially full length Class 1 cDNAs, and 5 partial 
length cDNAs to create a composite cDNA structure of 1685 bp (SEQ ID NO: 1), excluding the 

25 3' poly(A) tract. The 5' untranslated region is 322 bp with the splice variant site occurring 

between nucleotides 216 and 217. This region contains 6 ATG sites which do not conform to 
consensus initiation sites (53) and are followed by near in-frame termination codons. An open 
reading firame of 933 bp begins at position 323 with the first consensus initiation ATG (a purine 
occurs at -3) (53). The length of the 3* untranslated region from all of the cDNA clones analyzed 

30 ranged from 301-430 bp. Within this terminal 129 bp, 5 distinct polyadenylation sites were 
observed and 13-18 bp upstream from each site is a variant of the consensus polyadenylation 
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signal. Poly(A) tails were most frequently observed at the first site (position 1556, -50% of 
clones). 

Isolation and Characterization of Human 3-OST-l cDNAs 

Three clones containing partial length human 3-OST-l cDNAs were identified by EST 
5 database searching (48) and were obtained fi-om the TIGR/ATCC Special Collection, as described 
above. Sequencing of the insert ends revealed the clones to be essentially equivalent, as each 
contained the same 947 bp region of the human 3-OST-l cDNA. The insert of I.M.A.G.E. 
Consortium CionelD 220372 was ^^P-labeled and used to screen 5x10^ plaques from a X TriplEx 
Brain cDNA library. Three positives were identified and isolated as TriplEx plasmids, and the 

10 largest cDNA 1.3 kb was sequenced completely. 

The nucleic acid sequence of mouse and human 3-OST-l cDNAs are -85% identical. The 
largest isolated himian clone contains 1 1 8 bp of 5' untranslated region with 2 nonconsensus ATG 
sites. The sequences of human and mouse cDNAs flanking the splice variant site on the 5' limit 
are distinct (positions 21 1-216 of SEQ ID NO: 1 and positions 5-10 of SEQ ID NO: 3), but on 

15 the 3* limit are identical (positions 217-222 of SEQ ID NO: 1 and positions 1 1-16 of SEQ ID NO: 
3), which raises the possibility that human 3-OST-l mRNA may also exhibit 5* splice variants. 
The first consensus ATG (with a purine occurring at -3 and a G at +4) (53) initiates an open 
reading fi'ame of 921 bp. For all 4 htmian cDNA clones examined, only a single polyadenylation 
site was observed resulting in a 3' untranslated region of 266 bp, which is 26 bp less than the most 

20 firequently observed 3* limit for the mouse cDNAs. 

Predicted Protein Structures of Mouse and Human 3-OST-l 

The mouse and human cDNAs encode novel 311 and 307 amino acid proteins of 35,876 
and 35,750 daltons, respectively, that exhibit 93% similarity. The deduced mouse primary 
structure contains regions corresponding to all 13 sequenced peptides and the amino terminus. 

25 For both types of 3-OST-l, the encoded protein is predicted to be an intraluminal resident. Kyte- 
Doolittle hydropathy analysis reveals only a single major hydrophobic region which begins at the 
amino terminus and lacks sufficient length for a membrane spanning domain. Moreover, the 
hydrophobic region dififers fi-om a membrane anchor in that it contains two glutamine residues and 
is not flanked by cationic residues. Thus, the above stretch of 1 8 residues constitutes a 

30 hydrophobic leader signal, and this region is followed by a signal peptidase cleavage site between 
amino acids 20 and 21, as determined by the method of von Heijne (54). The possibility of signal 
peptidase cleavage is supported by the amino-terminal analysis of mouse 3-OST-l, which began 
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with His^V Given that heparan biosjmthesis is considered to occur in the /raw^-Golgi, the above 
data suggest that the 3-OST-l is an intraluminal enzyme. Just past the signal peptidase cleavage 
site, the mouse 3-OST-l contains an extra 4 residues (Asp^'^-Pro^^-Gly^-Pro^^^ not found in the 
human form. Both 3-OST-l proteins exhibit 5 potential iV-glycosylation sites which account for 
5 the apparent discrepancy between the molecular weights of the predicted amino terminus trimmed 
enzyme (-34 kDa) and the previously purified enzyme (a broad band of 46 kDa was observed on 
SDS-PAGE) (26). Only two cysteine residues are present, and these closely spaced residues are 
likely to form a disulfide bond which generates a peptide loop of 10 amino acids. Interestingly, 
the carboxy 140 residue region is extremely basic (25% H, K, R; 12% E, D); however, this region 

10 does not exhibit previously recognized heparin binding motifs. 

Recombinant Expression of Mouse and Human 3-OST-l Enzvme (r3-OST-n 

Three distinct expression approaches were employed to confirm that the isolated cDNAs 
encode 3-OST-l enzyme. The resulting recombinantly expressed 3-OST-l enzyme was 
designated as r3-OST-l, to distinguish this form from the previously purified native 3-OST-l 

15 enzyme. First, the vector pCMV-3-OST (a pcDNA3 derivative in which the CMV promoter 

transcribes the mouse 3-OST-l cDNA) was transiently expressed in COS-7 cells and the resulting 
level of HS"**^^ conversion activity accumulated in Serum-Free Medixmi over 32 h was measured, as 
described above. HS*""* conversion activity is a 3-OST-l catalyzed reaction which requires 
unlabeled PAPS to convert ^^S-HS^*"' mto ^^S-HS**"'. Before or after pcDNA3 transfection, 

20 typically COS-7 conditioned Serum-Free Medium contained a low but detectable amount of HS"*"* 
conversion activity, whereas transfection by pCMV-3-OST elevated levels ~2,000-fold. 

Second, to exclude the remote possibility that the expression of the mouse 3-OST-l 
cDNA indirectly induces, rather than directly encodes, HS"""^ conversion activity, a Protein A/3- 
OST-1 fiision protein was analyzed. COS-7 cells were transiently transfected with pCMV- 

25 ProA3-OST, a pCMV-3-OST derivative in which the amino-terminal 26 residues of the mouse 3- 
OST-1 are replaced with a Protein A tag, and Protein A tagged mouse r3-OST-l was extracted 
with IgG agarose beads firom 1 55 ml of conditioned Serum-Free Medium, as described above. 
The afiBnity purification recovered undetectable and less than 0.5% of initial HS"""^ conversion 
activity fi-om control pcDNA3 and pCMV-3-OST transfection samples, respectively, whereas 

30 -7,000 imits (10% recovery) were extracted fi-om pCMV-ProA3-OST transfection samples. 
Thus, the mouse 3-OST-l cDNA directly encodes HS"* conversion activity. 
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Third, the activities of cell-free synthesized mouse and human r3-OST-l were examined. 
Synthetic capped mouse and human 3-OST-l mRNAs were generated by in vitro transcription 
and then in vitro translated with reticulocyte lysate in the presence and absence of canine 
pancreatic microsomal membranes, as described above. HS*""^ conversion activity was 

5 undetectable in the control in vitro translation reactions which lacked mRNA template, with or 
without microsomal membranes. A low level HS*''^ conversion activity resulted from the addition 
of synthetic 3-OST-l mRNA templates to translation reactions lacking microsomal membranes 
(mouse, 0.86 ± 0.028 units/jil, w = 3; human, 2.1 =t 0.063 units/jil, n = 3); however, -15-fold 
greater levels occurred when microsomal membranes were included in translation reactions 

10 (mouse, 14.3 ± 0.27 units/fil, w = 3; human, 32.4 ± 2.1 units/^l, n = 3). The apparent activation 
of nascent r3-OST-l by co-translational processing within microsomes may result from signal 
peptidase cleavage, iV-linked glycosylation, and/or a facilitation of correct protein folding. The 
slightly greater production from the human 3-OST-l cDNA may reflect the more favorable 
context of the human initiation codon, or the reduced length of the human 5* untranslated region. 

15 Independent of the above considerations, the above data confirm that isolated mouse and human 
cDNAs encode HS*""* conversion activity. 

Next, the biochemical specificity of the HS*""* conversion activity generated from each 
expression approach was examined by incubating crude or purified enzyme with [^^S]PAPS and 
unlabeled HS^*^, recovering radiolabeled GAG by DEAE chromatography and characterizing the 

20 resultant products. The HS"^ conversion activity of the wild-type mouse r3-OST-l produced by 
transfecting COS-7 cells with pCMV-3-OST (1.35 x 10^ units in 240 ml of conditioned Serum- 
Free Medium) was first purified away from potential contaminating sulfotransferase activities by 
heparin-AF Toyopearl chromatography followed by 3',5 -ADP-agarose chromatography, which 
yielded ---1 ^lg of protein containing 340,000 units (~20,000-fold purification with 25% overall 

25 recovery); whereas, the IgG agarose-purified Protein A tagged r3-OST-l and in vitro translation 
reactions of mouse and human 3-OST-l mRNA templates were directly analyzed, as described 
above. About 0.5 - 1 x 10^ cpm of product was generated with purified wild-type r3-OST-l, 
purified Protein A tagged r3-OST-l, and nonpurified in vitro translation reactions containing 
mouse and human r3-OST-l, respectively. Portions of each labeled product were incubated with 

30 purified heparitinase (0.5 units/ml) or chondroitinase ABC (0.5 units/ml) and HPLC-GPC analysis 
indicated that in all cases label was exclusively incorporated into HS. Portions of the labeled HS 
samples were also JV-desulfated with nitrous acid at pH 1.5, and analyzed by P-2 polyacrylamide 
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gel filtration to determine the amounts of liberated free [ S]sulfate, as described above. The 
results demonstrated no increased generation of free [^^S]sulfate. Finally, portions of the labeled 
samples were AT affinity fractionated, which revealed that in each case --40% of the ^^S-label was 
incorporated in HS**^ and approximately -60% of the ^*S-label was incorporated in HS"***'\ The 

5 labeled HS^"^* and HS"^*"* generated by the wild-type purified r3-OST-l were chemically cleaved to 
disaccharides with nitrous acid treatment, appropriate ^H-labeled disaccharides standards were 
added, and the ^^S- and ^H-labeled species were coresolved by RPIP-HPLC as outlined above. 
The results show that the ^^S-label coelutes with [^H]GlcA->AMN-3-0-S03 and 
[^H]GlcA— >AMN-3,6-0-(S03)2, respectively. This approach also revealed that Protein A tagged 

10 r3-OST-l, and in vitro translation derived mouse and human r3-OST-l generated ^^S-HS which 
only contained ^^S-labeled disaccharides that coeluted with [^H]GlcA->AMN-3-0-S03 and 
[^H]GlcA-»AMN-3,6-0-(S03)2, respectively. It was previously shown that ^^S-labeled 
GlcA->AMN-3,6-0-(S03)2 generated by purified 3-OST-l enzyme contains ^^S solely in the 3-(9- 
position (26). Thus, the expressed HS'*''^ conversion activities exclusively catalyze the transfer of 

15 sulfate to the 3-0- position of glucosamine units in HS'^^ and HS""*"'. 
Northern Analvsis of Rodent and Human 3-OST-l Expression 

Northern blot analysis reveals the presence of 3-OST-l message in different kinds of 
endothelial cells as well as a mast cell line. Both cell types have previously been shown to form 
HS**^* and anticoagulant heparin, respectively (6, 8, 55). Three size categories of rodent 3-OST-l 

20 mRNA (about 1.7, 2.3, 3.3 kb) and a single size species of the human message (about 1.7 kb) 
were evident. As described above, the mouse forms arise from differential splicing within the 5' 
untranslated region. Similar size categories are also expressed by rat (RFPEC) endothelial cells, 
suggesting a similar mechanism of origin. The abundance of each category varies v^th each cell 
line, which suggests that a mechanism exists to regulate such differential splicing. The 

25 immortalized mouse mast cell line, C57. 1, expresses high levels of the same three size categories, 
which suggests that expression of a single 3-OST-l gene is required for the synthesis of both 
HS*"^ and anticoagulant heparin. 

The 3-OST-l Sequence Defines a Heparan Sulfotransferase Familv 

Extensive computer-aided data bank searching revealed the 3-OST-l protein to be a 
30 previously unidentified protein; furthermore, the carboxy-terminal 250 residues exhibit a low 
homology (-30% similarity) to many previously identified sulfotransferases (which are typically 
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-300 residues in length) including chondroitin-, aryl-/phenol-, iV-hydroKyarylamine-, alcohol- 
/hydroxysteroid-, flavonol-, and nodulation factor sulfotransferases. We also observed a slightly 
greater homology (--40% similarity) to a functionally unidentified open reading frame of 247 
amino acids from Aeromonas salmonicida (GenBank accession number L37077). More 
5 importantly, the 3-OST-l protein exhibits --50% similarity with eill previously identified forms of 
the heparan biosynthetic enzyme iV-deacetylase/iV-sulfotransferase (NST). In particular, extensive 
homology exists across the entire 250-270 carboxy-tenninal residues of these enzymes. Thus, it 
appears that a common sulfotransferase structure is shared by two distinct types of heparan 
biosynthetic enzyme. Given that NST is a bifunctional enzjone, the above observation suggests 

10 that NST enzymes possess sulfotransferase activity within a -270 residue carboxy-terminal 
domain, whereas deacetylase activity would be contained within the remaining -560 luminal 
residues. Interestingly, the region of consensus Lys -Arg , which encompasses the 
presumptive cysteine bridged peptide loop (described above), exhibits complete conservation for 
12 of the 22 residues (including both cysteines) among all 3-OST-l and NST species. 

15 Identification and molecular cloning of 3-OST-2. 3-OST-3A. 3-OST-3B and 3-OST-4 

The 3-OST-l protein exhibits a COOH-terminal region of -260 residues which was 
determined to be a sulfotransferase (ST) domain based on homology to all known 
sulfotransferases. The National Center for Biotechnology Information data bank of expressed 
sequence tags (ESTs) was searched with amino acid sequences of the ST domain from the human 

20 3-OST-l cDNA to reveal seven human cDNAs encoding three novel related species. The forms 
were subsequently designated as 3-OST-2 (I.M.A.G.E. Consortium (LLNL) ClonelD c-20dl0), 
3.0ST-3 (Clone ID 284542) and 3.0ST-4 (Clone IDs HIBCX69 , IB727, 166466, 23279, and c- 
3ie01). These EST clones were obtained from the TIGR/ATCC Special Collection, and the 
inserts were completely sequenced, revealing that all clones were of partial length. 

25 To obtain full length clones, isoform specific probes were generated from the EST clones 

and used to screen X TriplEx human cDNA libraries. 7 and 4 additional 3-OST-2 and 3-OST-4 
cDNAs were isolated from a brain library, and 8 new 3-OST-3 cDNAs were recovered from a 
liver library. The cDNA inserts were completely sequenced, revealing the full length form for 3- 
OST-2 as well as 2 distinct full length forms for 3-OST-3 (3-OST-3A and 3-OST-3B). The 

30 additional 3-OST-4 clones were also of partial length. 

3-OST-2. 3-OST-3 A 3-OST-3B and 3-OST-4 Protein Structures and Activities 
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The 3-OST-2, 3.0ST-3A, and 3-OST-3B proteins are 367, 406, and 390 amino acids in 
length, respectively. All three proteins conform to the architecture of a type-II integral membrane 
protein. These proteins and the partial length 3-OST-4 share a common (85% similarity) ST 
domain region of -260 amino acid at their COOH-terminus. To characterize the encoded HS 
5 sulfotransferase activities, the 3-OST-2, 3-OST-3A, and 3-OST-3B cDNAs were individually 
expressed in COS-7 cells. 

The analysis of transfected cell extracts demonstrated that each enzyme transfers sulfate 
specifically to the 3-0 position of glucosamine residues within HS; however distinct specificities 
occur. 3-OST-2 preferentially sulfates regions containing GlcA 2S-»GlcNS to generate 

10 GlcA 2S->GlcNS 3 S; whereas both 3-OST-3 A, and 3-OST-3B recognize regions with 
IdoA 2S->GlcNS to generate IdoA 2S->GlcNS 3S. 
Expression Patterns Indicate Biological Function 

The biologic function of these novel enzymes was elucidated by performing northern blot 
analysis. 3-OST-4 is exclusively expressed in the brain, whereas 3-OST-2 mRNA predominantly 

15 occurs in the brain with minor levels also found in heart, lung, skeletal muscle and placenta. 3- 
OST-3 forms occur in virtually all tissues but with barely detectable levels in brain, low levels in 
heart, lung, skeletal muscle and kidney, and extremely abundant expression in liver and placenta. 
Thus 3-OST-2 and 3-OST-4 appear to be the brain counterparts of 3-OST-3. The product of 3- 
OST-3 (IdoA 2S— >GlcNS 3S) has previously been shown to be extremely abundant in HSPGs 

20 isolated from the glomerular basement membrane (GBM) of the kidney. These HSPGs are critical 
to regulating the permselectivity of the GBM. This function occurs through interactions with 
extracellular matrix components that regulate the pore size of the matrix. Given that the liver, 
placenta, and kidney glomerulus are all responsible for the filtration of macromolecular 
components from blood and all exhibit high 3-OST-3 expression, it appears that 3-OST-3 serves a 

25 common function in each situation: to regulate macromolecular permeability. In this functional 
regard, the high brain expression of 3-OST-2 and 3-OST-4 correlates with the major molecular 
permeability barrier of the central nervous system, the blood brain barrier. 
Therapeutic Utilities 

The 3-OST heparan biosynthetic enzymes may be generated by recombinant expression of 
30 the isolated cDNAs to generate novel glycosaminoglycan drugs of specific structure through an in 
vitro biochemical synthesis approach. Specifically, 3-OST-l may be used to generate 
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anticoagulant pentasaccharides, which may be administered subcutaneously to treat thrombotic 
disorders such as deep vein thrombosis and pulmonary embolism. The 3-OST-l enzyme may also 
be used to generate an orally absorbable form of pentasaccharide from an appropriate 
carbohydrate substrate linked to a hydrophobic group. In an analogous fashion, specific 

5 glycosaminoglycan products may be generated from 3-OST-2, 3-OST-3 and 3-OST-4, which may 
be used as therapeutics to alter macromolecular permeability of various vascular beds. Drugs 
which reduce capillary permeability may, at the very least, be used to treat (1) microproteinurea 
and macroproteinurea of renal diseases including diabetic nephropathy and the various forms of 
glomerulonephrititis; (2) neoplastic growths by limiting nutrient supply to tumors; and (3) 

10 inflammatory diseases were macromolecular constituents of the plasma are required for initiating 
and maintaining a localized inflammation. Conversely, drugs which enhance capillary permeability 
may be used (1) as an adjunctive treatment to facilitate pharmacological access to vascular beds, 
which exhibit highly selective drug entry, such as the blood brain barrier and the placental barrier; 
and (2) to enhance nutrient supply to under-perftised tissues such as the myocardium after an 

15 infarct. 

Specific heparan sulfate structures regulate additional biologic processes by interacting 
with numerous protein effector molecules including growth and differentiation factors (e.g., FGF 
family members, HB-EGF, HGF/SF, interferon y, PDGF, SDGF, and VEGFATF), chemokines 
(e.g., MIP-ip, RANTES, and GRO), receptors (e.g., TGF-P receptors), mast cell proteases, 

20 protease inhibitors (e.g., AT, heparin cofactor n, leuserpin, plasminogen activator inhibitor- 1, 
protease nexins), degradative enzymes (e.g., elastase, acetylcholinesterase, extracellular 
superoxide dismutase, thrombin, tissue plasminogen activator, lipoprotein lipase, hepatic and 
pancreatic triglyceride lipase, and cholesterol esterase), apolipoproteins (e.g., apoB and apoE), 
matrix components (e.g., fibronectin, wnt-1, interstitial coUagens, laminin, pleiotropin, tenascin, 

25 thrombospondin, and vitronectin) viral coat proteins (e.g., gC and gB of HSV types I and II, gC- 
II of CMV, and gpl20 of HIV), nuclear proteins (e,g., c-fos, c-jun, RNA and DNA polymerases, 
and steroid receptors), cellular adhesion molecules (e.g., L-selectin, P-selectin, PECAM-1, and N- 
CAM) and other molecules (e.g., HB-GAM/pleiothrophin, amphoterin, and PF4). 

Using routine methods (e.g., site-directed mutagenesis) the available 3-OST cDNAs may 

30 be selectively mutated to alter substrate recognition properties so as to produce enzymes that 

generate novel glycosaminoglycan structures which modulate the biologic processes regulated by 
the above effector molecules. Thus, novel drugs may also be biochemically synthesized from 
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recombinantly expressed mutated enzymes. Such substances may serve to (1) enhance growth or 
regeneration of specific cell types such as the endothelial cells of the heart after infarction, or 
neurons in neurodegenerative diseases; (2) suppress undesirable cell growth in conditions such as 
cancer (either directly by acting on the cancers cells or indirectly by preventing endothelial cells 

5 from neovascularizing the tumor), atherosclerosis (by preventing smooth muscle cell growth), and 
inflammatory diseases characterized by cellular proliferation; (3) prevent metastasis of tumors by 
modulating cell/matrix interactions; (4) reduce the destructive side effects of inflammatory 
reactions by inhibiting degradative enzymes or by activating inhibitory molecules (e.g. protease 
inhibitors) which may be directly or indirectly protective by limiting extravasation of lymphocytes; 

10 (5) modulate serum lipid levels by enhancing or reducing the cellular or tissue uptake or 

degradation of specific lipoprotein classes; (6) treat viral infections by preventing viral entry into 
cells; and (7) facilitate axon regeneration subsequent to nerve severing. 

Bacterial expression of 3-OST-l. The human and mouse 3-OST-l proteins have been 
expressed as active, soluble protein in E, coli. This has been achieved using the pET system from 

15 NOVAGEN (Madison, WI). The human and mouse 3-OST-l cDNA's were PGR amplified with 
pfu DNA polymerase and purified cloned plasmids as template. The primers that were used were 
designed to amplify a cDNA fragment starting, in frame, after the native signal sequence and 
including the native translational termination codon. Additionally, the PGR primers were 
designed to include restriction sites that would facilitate cloning into the vectors described below 

20 in the correct transcriptional/translational reading frames. 3-OST-l was cloned into vectors 
pET12a, 15B and 28a according to the manufacturer's instructions. This places the 3-OST-l 
cDNA downstream of a powerful, inducible T7 transcription site and includes an efficient Shine- 
Dalgamo sequence at the appropriate distance from the initiator methionine of the construct. 
Good yields of active protein result from IPTG induction at room temperature. The 

25 specific activity appears to be less than purified, or Baculovirus/sf9 produced material. The exact 
magnitude of the diminution of activity is unclear at this time; however, it may be 10-1000 fold. 
The presently preferred purification scheme is: (1) Induction at 22 ^C. (2) Sonication of 
bacteria, centrifugation to remove inclusion bodies and cell debris, purification of crude bacterial 
sonicate on heparin sepharose as described elsewhere. (3) PAP colunm chromatography. (4) 

30 Gel permeation chromatography. Step (4) is only needed for obtaining monomeric, pure 3-OST- 
1, and not for active protein preparation. 
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CLAIMS 

What is claimed is: 



1 1. An isolated nucleic acid encoding at least a functional fragment of a 3-OST protein. 

12. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-OST protein 

2 comprising a mature 3-OST-l protein selected from the group consisting of mature murine 3- 

3 OST- 1 and mature human 3-OST- 1 . 

1 3. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-OST protein 

2 comprising a protein selected from the group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3- 

3 OST-3B, 3-OST-4, and ce3-OST. 

14. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-0- 

2 sulfotransferase domain of a 3-OST protein selected from the group consisting of 3-OST-l, 3- 

3 OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST. 

1 S. An isolated nucleic acid as in claim 1 wherein said nucleic acid comprises a nucleotide 

2 sequence selected from nucleotide sequences within: 

3 (a) SEQIDNO: 1; 

4 (b) SEQIDNO: 3; 

5 (c) SEQ ID NO: 5; 

6 (d) SEQ ID NO: 7; 

7 (e) SEQ ID NO: 9; 

8 (f) SEQIDNO: 11; 

9 (g) a sequence having at least 60% nucleotide sequence identity with at least one of 

10 (aHO and encoding a ftmctional fragment having sequence-specific HS binding affinity or 3-0- 

1 1 sulfotransferase activity; and 

12 (h) a sequence differing from a sequence of (a)-(g) only by the substitution of 

13 synonymous codons. 

1 6. An isolated nucleic acid as in claim 1 wherein said nucleic acid comprises a nucleotide 

2 sequence encoding a polypeptide selected from the group consisting of: 

3 (a) residues 21-52, 260-269, 250-276, 53-311, or 21-307 of SEQ ID NO: 2; 

4 (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; 

5 (c) residues 42-109, 313-325, 303-332, or 1 10-367 of SEQ ID NO: 6; 
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6 (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; 

7 (e) residues 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; 

8 (f) residues 396-408, 386-4150, or 207-456 of SEQ ID NO: 12; 

9 (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; 

10 (h) a sequence having at least 60% amino acid sequence similarity with at least one of 

1 1 (a)-(g) and encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 

12 sulfotransferase activity; emd 

13 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 

1 7. An isolated nucleic acid comprising at least 16 consecutive nucleotides of a nucleotide 

2 sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, 

3 SEQ ID NO: 7, SEQ ID NO: 9, and SEQ ID NO: 1 1 . 

1 8. A host cell transformed with a nucleic acid of any one of claims 1-7, or a descendant 

2 thereof 

1 9. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 bacterial cells, yeast cells, and insect cells. 

1 10. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 somatic cells, fetal cells, embryonic stem cells, 2ygotes, gametes, germ line cells, and transgenic 

3 animal cells. 

1 11. A host cell as in claim 8 wherein said cell is a mammalian cell. 

1 12. A host cell as in claim 1 1 wherein said cell is selected fi-om the group consisting of COS-7 

2 cells, CHO, murine primary cardiac microvascular endothelial cells (CME), murine mast cell line 

3 C57.1, human primary endothelial cells of umbilical vein (HUVEC), F9 embryonal carcinoma 

4 cells, rat fat pad endothelial cells (RFPEC), L cells, and cells derived from the transgenic animals 

5 of the invention. 

1 13. A substantially pure protein preparation comprising at least a functional fragment of a 3- 

2 OST protein. 

1 14. A substantially pure protein preparation as in claim 13 wherein said 3-OST protein is 

2 selected from the group consisting of mature murine 3-OST-l and mature human 3-OST-l . 
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1 IS. A substantially pure protein as in claim 13 wherein said 3-OST protein is selected from the 

2 group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST. 

1 16. A substantially pure protein preparation as in claim 13 wherein said functional fi'agment 

2 comprises a 3-O-sulfotransferase domain of a 3-OST protein selected from the group consisting 

3 of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST. 

1 17. A substantially pure protein preparation as in claim 13 wherein said functional fi'agment 

2 comprises an amino acid sequence selected from amino acid sequences within: 

3 (a) SEQIDNO:2; 

4 (b) SEQIDNO:4; 

5 (c) SEQIDNO:6; 

6 (d) SEQIDNO:8; 

7 (e) SEQIDNO: 10; 

8 (f) SEQ ID NO: 12; 

9 (g) SEQIDNO: 15; 

10 (h) a sequence having at least 60% amino acid similarity with at least one of (a)-(g) and 

1 1 having sequence-specific HS binding affinity or 3-O-sulfotransferase activity; and 

12 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 

1 18. A substantially pure protein preparation as in claim 1 3 wherein said functional fi-agment 

2 comprises an amino acid sequence selected from the group consisting of 

3 (a) residues 21-52, 260-269, 250-276, 53-3 1 1, or 21-307 of SEQ ID NO: 2; 

4 (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; 

5 (c) residues 42-109, 3 13-325, 303-332, or 1 10-367 of SEQ ID NO: 6; 

6 (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; 

7 (e) residues 66-132, 336-348, 326-355, or 133-390 of SEQ ED NO: 10; 

8 (f) residues 396-408, 386-415, or 207-456 of SEQ ID NO: 12; 

9 (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; 

10 (h) a sequence having at least 60% amino acid sequence similarity with at least one of 

1 1 (a)-(g) and encoding a fijnctional firagment having sequence-spedfic HS binding affinity or 3-0- 

12 sulfotransferase activity; and 

13 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 
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1 19. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 

2 or proteoglycan polysaccharides comprising: 

3 contacting said preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein 

4 in the presence of a sulfate donor under conditions which permit sulfation of said residues; 

5 wherein, said 3-OST protein is selected from the group consisting of 3-OST- 1, 3-OST-2, 

6 3-OST-3 A, 3-OST-3B, 3-OST-4, ce3-OST, and conservative substitution variants or chimeras 

7 thereof 

1 20, A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 

2 or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 

3 of GlcA->GlcNS ±6S comprising: 

4 contacting said preparation with a 3-OST- 1 protein in the presence of a sulfate donor 

5 under conditions which permit said 3-OST- 1 to convert said GlcA->GlcNS ±6S sequence to 

6 GlcA^GlcNS 3S ±6S. 

7 wherein the 3-OST- 1 protein is selected from the group consisting of murine 3-OST-l, 

8 human 3-OST-l, mature murine 3-OST-l, mature human 3-OST-l, a functional fragment of a 3- 

9 OST-1 having 3-O-sulfotransferase activity, a conservative substitution variant of 3-OST-l 

10 having 3-O-sulfotransferase activity, and a chimeric 3-OST-l having 3-O-sulfotransferase activity. 

1 21 . A method as in claim 20, wherein said GlcA— >GlcNS ±6S polysaccharide sequence 

2 comprises a part of a polysaccharide sequence selected from the group consisting of: 

3 (a) GlcA->GlcNS ±6S->IdoA 2S^ GlcNS ±6S; 

4 (b) IdoA->GlcNAc 6S->GlcA->GlcNS ±6S-^IdoA 2S^ GlcNS 6S; 

5 (c) IdoA->GlcNS 6S-»GlcA->GIcNS ±6S->IdoA 2S-> GlcNS 6S; 

6 (d) IdoA-^GlcNAc->GlcA->GlcNS ±6S-^IdoA 2S^ GlcNS 6S; 

7 (e) IdoA-»GlcNS-^GlcA-^GlcNS ±6S->IdoA 2S-> GlcNS 6S; 

8 (£) IdoA->GlcNAc 6S->^GlcA->GlcNS ±6S-^IdoA 2S-» GlcNS; 

9 (g) IdoA-^GlcNS 6S->GIcA^GlcNS ±6S->IdoA 2S-> GlcNS; 

1 22. A method of 3-O-sulfating saccharide residues v^thin a preparation of glycosaminoglycan 

2 or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 

3 of GlcA 2S->GlcNS comprising: 
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4 contacting said preparation with a 3-OST-2 protein in the presence of a sulfate donor 

5 under conditions which permit said 3-OST-2 to convert said GlcA 2S->GlcNS sequence to GlcA 

6 2S^GlcNS 3S. 

7 wherein the 3-OST-2 protein is selected from the group consisting of 3-OST-2, a 

8 functional fragment of a 3-OST-2 having 3 -O-sulfo transferase activity, a conservative substitution 

9 variant of 3-OST-2 having 3-O-sulfotransferase activity, and a chimeric 3-OST-2 having 3-0- 
10 sulfotransferase activity. 

1 23. A method as in claim 22, wherein said GlcA 2S— >GlcNS polysaccharide sequence 

2 comprises a part of a GlcNS->GlcA 2S->GlcNS polysaccharide sequence. 

1 24. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 

2 or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 

3 of IdoA 2S->GlcNS comprising: 

4 contacting said preparation with a 3-OST-3 protein in the presence of a sulfate donor 

5 under conditions which permit said 3-OST-3 to convert said IdoA 2S->GlcNS sequence to IdoA 

6 2S->GlcNS 3S. 

7 wherein the 3-OST-3 protein is selected from the group consisting of 3-OST-3A, 3-OST- 

8 3B, a functional fragment of a 3-OST-3 having 3-O-sulfotransferase activity, a conservative 

9 substitution variant of 3-OST-3 having 3-O-sulfotransferase activity, and a chimeric 3-OST-3 
10 having 3-O-sulfotransferase activity. 

1 25. A method as in claim 24, wherein said IdoA 2S— >GlcNS polysaccharide sequence 

2 comprises a part of a GlcNS— >IdoA 2S— >GlcNS polysaccharide sequence. 

1 26. A method for enriching the AT-binding fraction in a preparation of heparan sulfates, 

2 wherein said preparation includes a polysaccharide sequence of GlcA— >GlcNS db6S comprising: 

3 contacting said preparation with 3-OST-l protein in the presence of a sulfate donor under 

4 conditions which permit said 3-OST-l to convert said GlcA-^GlcNS ±6S sequence to 

5 GlcA-»GlcNS 3S ±6S, thereby increasing the fraction of AT-binding heparan sulfates. 

1 27. A method for converting HS"* precursor to HS"* in a preparation of heparan sulfates, 

2 wherein said preparation includes HS*^ precursor polysaccharides including a sequence of 

3 GlcA— >GlcNS ±6S comprising: 
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4 contacting said preparation with 3-OST-l protein in the presence of a sulfate donor under 

5 conditions which permit said 3-OST-l to convert said GlcA->GlcNS ±6S sequence to 

6 GlcA->GlcNS 3S ±6S, thereby converting HS'*'* precursor to HS'^. 

1 28. A method as in any one of claims 19-28 wherein said sulfate donor is PAPS. 

1 29. A non-human animal model, wherein a genome of said animal, or an ancestor thereof, 

2 wherein said recombinant construct has introduced a modification into said genome, said 

3 modification selected from the group consisting of insertion of a nucleic acid encoding at least a 

4 functional fragment of a conspecific wild type 3-OST protein, insertion of a nucleic acid encoding 

5 at least a functional fragment of a transpecific allelic variant of the 3-OST sequences, insertion of 

6 nucleic acid encoding at least a functional fragment of an allelic variant of 3-OST sequence, 

7 inactivation of an endogenous 3-OST gene, and insertion by homologous recombination of a 

8 reporter gene coupled to 3-OST transcriptional elements. 

1 30. An animal as in claim 29 wherein said modification is insertion of nucleic acid encoding at 

2 least a functional fragment of wild type 3-OST selecting from the sequence consisting of the 

3 SPLAG-domain, the cysteine-binding peptide loop, and the --260 residue ST domain. 

1 31. An animal as in claim 29 wherein smd animal is selected from the group consisting of rats, 

2 mice, hamsters, guinea pigs, rabbit, dogs, cats, goats, sheep, pigs, and non-human primates. 

1 32. An animal as in claim 29 wherein said animal is an invertebrate. 

1 33. A method of producing antibodies which selectively bind to a 3-OST protein comprising 

2 the steps of 

3 administering an inmiunogenically effective amount of a 3-OST epitope to an animal; 

4 allowing said animal to produce antibodies to said epitope; and 

5 obtaining said antibodies from said animal or from a cell culture derived therefrom. 

1 34. A substantially pure preparation of antibody which selectively binds to an epitope of a 3- 

2 OST protein. 

1 35. A substantially pure preparation of an antibody as claimed in 34 wherein said antibody 

2 selectively binds to at least a fragment of 3-OST. 

1 36. A cell line producing an antibody of any one of the claims 34. 
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1 37. A method for identifying compounds which can modulate the expression of a 3-OST-gene 

2 comprising steps of 

3 providing a cell expressing a nucleic acid under the control of a 3-OST regulatory 

4 sequence; 

5 contacting said cell with at least one candidate compound; and 

6 assaying for a change in the in the expression of said nucleic acid. 

1 38. The method of claim 37, wherein said nucleic acid comprises a marker gene and a 3-OST 

2 gene 

1 39. The method of claim 37, wherein said assaying step comprises detecting a change in 3- 

2 OST mRNA level. 

1 40. The method of claim 37, wherein said assaying step comprises detecting a change in 3- 

2 OST protein encoded by said nucleic acid. 

1 41 . A method of determining partial sequence information for complex polysaccharides 

2 comprising the steps of: 

3 contacting a first sample of polysaccharide with at least one ligand which binds 

4 polysaccharides in a sequence specific manner; 

5 contacting the resulting polysaccharide-ligand complex with at least one agent that 

6 modifies complex polysaccharides; 

7 contacting a second sample of polysaccharide with the same modifying agent; 

8 comparing said first and second samples for ligand-specific inhibition of modifications 

9 caused by said modifying agent. 

1 42. The method of claim 41, wherein said complex polysaccharide is a glycosaminoglycan. 

1 43. The method of claim 41, wherein said ligand is catalytically inactive. 

1 44. The method of claim 41, wherein said ligand is an inactive 3-OST. 

1 45. The method of claim 41, wherein said agent that modifies polysaccharides is selected from 

2 the group consisting of epimerases, lyases, sulfotransferases, N-acetyltransferases, N- 

3 deacetylases, epimerases. 

1 46. The method of claim 45, wherein said modifying agent is a sequence specific degrading 

2 agent. 

1 47. The method of claim 45, wherein said modifying agent is a non-sequence specific 

2 degrading agent. 
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1 48. The method of claims 46, wherein said degrading agent is a lyase. 

1 49. The method of claim 47, wherein said non-sequence specific degrading agent nitrous acid. 

1 50. The method of claim 45, further comprising affinity purifying said modified first and 

2 second samples. 

1 51. The method of claim 45, wherein the step of comparing includes a comparison of size 

2 profiles, 

1 52. A method of determining partial sequence information for complex polysaccharides 

2 comprising the steps of: 

3 contacting a first sample of complex polysaccharides with a 3-OST protein in the presence 

4 of a sulfate donor under conditions which permit sulfation by said 3-OST; 

5 contacting said first sample and a second sample with at least one enzyme which cleaves 

6 polysaccharides in a sequence-specific manner; 

7 determining the size profiles of the resulting fragments. 

1 53. The method of claim 52, wherein the determining the size profile step further comprises 

2 the step of comparing said first sample to a second sample cleaved by the same enzymes. 

1 54. The method of claim 52, wherein said enzymes which degrade polysaccharides in a 

2 sequence specific manner are selected from the group consisting of polysaccharide lyases, 

3 heparinase I, heparinase II, and heparinase III 

1 55. A method of determining partial sequence information for a semiple containing complex 

2 polysaccharides comprising the steps of: 

3 contacting said sample of polysaccharide with a 3-OST protein which lacks enzymatic 

4 function with a under conditions which permit said 3-OST protein to bind to said polysaccharide 

5 in a sequence specific manner; 

6 applying said sample to an afiBnity column; 

7 applying degrading agents to said column; 

8 analyzing the resulting degradation products. 

1 56. The method of claim 55, further comprising repeating the steps applying degrading agents 

2 and analyzing using a series of different sequence specific polysaccharide cleavage enzymes. 
3 

1 57. An isolated nucleic acid comprising a 5' untranslated regulatory region of a 3-OST gene 

2 operably joined to a marker gene. 
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I 58. A host cell transformed with the isolated nucleic acid of claim 57, or a descendent thereof 

1 59, A method of identifying compounds capable of modulating the expression of a 3-OST 

2 gene comprising contacting a candidate compound with the transformed host eel! of claim 58 and 

3 assaying for changes in expression of said marker. 

1 60. A method as in claim 59, wherein said regulatory region comprise the 5' untranslated 

2 region of SEQ ID NO: 1 6. 
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<141> 
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<151> 1997-10-24 

<160> 16 

<170> Patentin Ver. 2.0 

<210> 1 

<211> 1685 

<212> DNA 

<213> Mus musculus 

<220> 
<221> CDS 

<222> (323) . . (1255) 
<223> mouse 3-OST-l 

<400> 1 



tgcattgcaa 


tgtgaagtgt 


tcctgaataa 


acctgcttga 


agaaggacaa 


cgtggtgttg 


60 


cgtctttcct 


gctggtcggg 


gtggaataga 


cacctcccct 


ttttaacttg 


ggtgacctca 


120 


t::gaacataaa 


agaacttaaa 


ggtagcaagc 


catggactta 


aagtaggctg 


accttgaact 


180 


cagagatctt 


cttggcaatg 


tctctggaga 


ttaaagtaat 


tggcaactgg 


agatactcat 


240 


gttccagtaa 


tcaagaggga 


gccttgctgc 


tacttcatga 


tccaggcgcg 


tgtggcccag 


300 


tgaagtccct 


gagctgtaca 


gc atg acc 


ttg ctg etc ctg ggt gcg gtg ctg 


352 






Met Thr 


Leu Leu Leu Leu Gly Ala Val Leu 





15 10 

Ctg gtg gcc cag ccc cag ctt gtg cat tec cac ccg get get cct ggc 400 
Leu Val Ala Gin Pro Gin Leu Val His Ser His Pro Ala Ala Pro Gly 
15 20 25 

ccg ggg etc aaa cag cag gag ctt ctg agg aag gtg att att etc cca 448 
Pro Gly Leu Lys Gin Gin Glu Leu Leu Arg Lys Val lie lie Leu Pro 
30 35 40 

gag gac acc gga gaa ggc aca gea tec aat ggt tec aca cag cag ctg 496 
Glu Asp Thr Gly Glu Gly Thr Ala Ser Asn Gly Ser Thr Gin Gin Leu 
45 50 55 

cca cag acc ate ate att ggg gtg cgc aag ggt ggt acc ega gcc ctg 544 
Pro Gin Thr lie lie He Gly Val Arg Lys Gly Gly Thr Arg Ala Leu 
60 65 70 
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eta gag atg etc age ctg eat eet gat gtt get gea get gaa aae gag 592 

Leu Glu Met Leu Ser Leu His Pro Asp Val Ala Ala Ala Glu Asn Glu 
75 80 85 90 

gte eat tte ttt gae tgg gag gag eat tae age caa ggc ctg ggc tgg 640 
Val His Phe Phe Asp Trp Glu Glu His Tyr Ser Gin Gly Leu Gly Trp 
95 100 105 

tac etc ace cag atg ccc ttc tec tec cct cac cag etc ace gtg gag 688 
Tyr Leu Thr Gin Met Pro Phe Ser Ser Pro His Gin Leu Thr Val Glu 
110 115 120 

aag aca ccc gee tat ttc act teg eee aaa gtg eet gag aga ate cac 736 
Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys Val Pro Glu Arg lie His 
125 130 135 

age atg aae eee ace ate cge ctg ctg ett ate ctg agg gae cea tea 784 
Ser Met Asn Pro Thr lie Arg Leu Leu Leu lie Leu Arg Asp Pro Ser 
140 145 150 

gag cge gtg ctg tee gae tac ace cag gtg ttg tac aae cac ctt cag 832 
Glu Arg Val Leu Ser Asp Tyr Thr Gin Val Leu Tyr Asn His Leu Gin 
155 160 165 170 

aag cac aag ccc tat cca ccc att gag gae etc eta atg egg gae ggt 880 
Lys His Lys Pro Tyr Pro Pro lie Glu Asp Leu Leu Met Arg Asp Gly 
175 180 185 

egg ctg aae ctg gae tac aag get etc aae cge age ctg tac eat gea 928 
Arg Leu Asn Leu Asp Tyr Lys Ala Leu Asn Arg Ser Leu Tyr His Ala 
190 195 200 

cac atg ctg aae tgg ctg cgt ttt tte eeg ttg ggc eae ate cac att 976 
His Met Leu Asn Trp Leu Arg Phe Phe Pro Leu Gly His lie His lie 
205 210 215 

gtg gat ggc gae cge etc ate aga gae eet tte eet gag ate cag aag 1024 
Val Asp Gly Asp Arg Leu lie Arg Asp Pro Phe Pro Glu lie Gin Lys 
220 225 230 

gte gaa aga ttc ctg aag ctt tet cca cag ate aac gcc teg aac tte 1072 
Val Glu Arg Phe Leu Lys Leu Ser Pro Gin lie Asn Ala Ser Asn Phe 
235 240 245 250 

tac ttt aac aaa ace aag ggc ttc tae tge ctg egg gae agt ggc aag 1120 
Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys Leu Arg Asp Ser Gly Lys 
255 260 265 

gae cge tgc tta cac gag tee aaa ggc egg geg cac ccc cag gtg gat 1168 
Asp Arg Cys Leu His Glu Ser Lys Gly Arg Ala His Pro Gin Val Asp 
270 275 280 

eee aaa eta ctt gat aaa ctg eae gaa tae ttt cat gag cca aat aag 1216 
Pro Lys Leu Leu Asp Lys Leu His Glu Tyr Phe His Glu Pro Asn Lys 
285 290 295 

aaa ttt ttc aag etc gtg ggc aga aca ttc gae tgg cac tgatttgccg 1265 
Lys Phe Phe Lys Leu Val Gly Arg Thr Phe Asp Trp His 
300 305 310 

tctcctaggc tcgggacttt tcctgttgtt aacttctggt gtacatctga aggggggagg 1325 

aaaataattt taaaaaggca tttaagctat aatttatttg taaaacccac aaatgacttc 1385 

tgtacagtat tagattcaca gttgccatat atagtagtta tatttttcta cttgttaaat 1445 

ggagggcgtt ttgtattgtt tttcatggtt gttaacattg tgtatatgtc tctataatat 1505 
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gaaggaactt aactattgca ctgaaaaaat: aagagatttt ttttttctgg agacctcttt 1565 
ttttgttgtt gttgttttaa atataattaa cctgcctcca atccaaaata gctctttgtt 1625 
ttcacctcct tgtcaaatct ataatctttt tctgcttaaa aaatttattg gtattatgga 1685 

<210> 2 

<211> 311 

<212> PRT 

<213> Mus mus cuius 

<400> 2 

Met Thr Leu Leu Leu Leu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
15 10 15 

Leu Val His Ser His Pro Ala Ala Pro Gly Pro Gly Leu Lys Gin Gin 
20 25 30 

Glu Leu Leu Arg Lys Val lie lie Leu Pro Glu Asp Thr Gly Glu Gly 
35 40 45 

Thr Ala Ser Asn Gly Ser Thr Gin Gin Leu Pro Gin Thr lie lie lie 
50 55 60 

Gly Val Arg Lys Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu 
65 70 75 80 

His Pro Asp Val Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp 
85 90 95 

Glu Glu His Tyr Ser Gin Gly Leu Gly Trp Tyr Leu Thr Gin Met Pro 
100 105 110 

Phe Ser Ser Pro His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe 
115 120 125 

Thr Ser Pro Lys Val Pro Glu Arg lie His Ser Met Asn Pro Thr lie 
130 135 140 

Arg Leu Leu Leu lie Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp 
145 150 155 160 

Tyr Thr Gin Val Leu Tyr Asn His Leu Gin Lys His Lys Pro Tyr Pro 
165 170 175 

Pro lie Glu Asp Leu Leu Met Arg Asp Gly Arg Leu Asn Leu Asp Tyr 
180 185 190 

Lys Ala Leu Asn Arg Ser Leu Tyr His Ala His Met Leu Asn Trp Leu 
195 200 205 

Arg Phe Phe Pro Leu Gly His He His He Val Asp Gly Asp Arg Leu 
210 215 220 

He Arg J^p Pro Phe Pro Glu He Gin Lys Val Glu Arg Phe Leu Lys 
225 230 235 240 

Leu Ser Pro Gin He Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys 
245 250 255 

Gly Phe Tyr Cys Leu Arg Asp Ser Gly Lys Asp Arg Cys Leu His Glu 
260 265 270 

Ser Lys Gly Arg Ala His Pro Gin Val Asp Pro Lys Leu Leu Asp Lys 
275 280 285 
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Leu His Glu Tyr Phe His Glu Pro Asn Lys Lys Phe Phe Lys Leu Val 
290 295 300 

Gly Arg Thr Phe Asp Trp His 
305 310 



<210> 3 

<211> 1305 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (119) . . (1039) 
<22 3> human 3-OST-l 

<400> 3 

cgcggctcag taattgaagg cctgaaacgc ccatgtgcca ctgactagga ggcttccctg 60 

ctgcggcact tcatgaccca gcggcgcgcg gcccagtgaa gccaccgtgg tgtccagc 118 

atg gcc gcg ctg etc ctg ggc gcg gtg ctg ctg gtg gcc cag ccc cag 166 
Met Ala Ala Leu Leu Leu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
15 10 15 

eta gtg cct tec cge ccc gcc gag eta ggc cag cag gag ctt ctg egg 214 
Leu Val Pro Ser Arg Pro Ala Glu Leu Gly Gin Gin Glu Leu Leu Arg 
20 25 30 

aaa gcg ggg ace etc cag gat gac gtc cge gat ggc gtg gee cea aac 262 
Lys Ala Gly Thr Leu Gin Asp Asp Val Arg Asp Gly Val Ala Pro Asn 
35 40 45 

ggc tct gee cag eag ttg ccg cag ace ate ate ate ggc gtg ege aag 310 
Gly Ser Ala Gin Gin Leu Pro Gin Thr He He He Gly Val Arg Lys 
50 55 60 

ggc ggc acg ege gca ctg ctg gag atg etc age ctg eac ccc gac gtg 358 
Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu His Pro Asp Val 
65 70 75 80 

gcg gee gcg gag aac gag gtc eac ttc ttc gac tgg gag gag cat tac 406 
Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp Glu Glu His Tyr 
85 90 95 

age eac ggc ttg ggc tgg tac etc age cag atg ccc ttc tee tgg cea 454 
Ser His Gly Leu Gly Trp Tyr Leu Ser Gin Met Pro Phe Ser Trp Pro 
100 105 110 

eac eag etc aca gtg gag aag ace ccc gcg tat ttc acg teg ccc aaa 5.02 
His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys 
115 120 125 

gtg cct gag cga gtc tac age atg aac ccg tee ate egg ctg ctg etc 550 
Val Pro Glu Arg Val Tyr Ser Met Asn Pro Ser He Arg Leu Leu Leu 
130 135 140 

ate ctg cga gac ccg teg gag cge gtg eta tct gac tac ace caa gtg 598 
He Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp Tyr Thr Gin Val 
145 150 155 160 

ttc tac aac eac atg eag aag eac aag ccc tac ccg tec ate gag gag 646 
Phe Tyr Asn His Met Gin Lys His Lys Pro Tyr Pro Ser He Glu Glu 
165 170 175 

ttc ctg gtg cge gat ggc agg etc aat gtg gac tac aag gcc etc aac 694 
Phe Leu Val Arg Asp Gly Arg Leu Asn Val Asp Tyr Lys Ala Leu Asn 
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180 185 190 

cgc age etc tac cac gtg cac atg cag aac tgg ctg cgc ttt ttc ccg 742 
Arg Ser Leu Tyr His Val His Met Gin Asn Trp Leu Arg Phe Phe Pro 
195 200 205 

ctg cgc cac ate cac att gtg gac ggc gac cgc etc ate agg gac ecc 790 
Leu Arg His lie His lie Val Asp Gly Asp Arg Leu lie Arg Asp Pro 
210 215 220 

ttc cct gag ate caa aag gtc gag agg ttc eta aag ctg teg ccg cag 838 
Phe Pro Glu lie Gin Lys Val Glu Arg Phe Leu Lys Leu Ser Pro Gin 
225 230 235 240 

ate aat get teg aac ttc tac ttt aac aaa ace aag ggc ttt tac tge 886 
lie Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys 
245 250 255 

ctg egg gac age ggc egg gac cgc tge tta cat gag tec aaa ggc egg 934 
Leu Arg Asp Ser Gly Arg Asp Arg Cys Leu His Glu Ser Lys Gly Arg 
260 265 270 

gcg cac ecc caa gtc gat ecc aaa eta etc aat aaa ctg cac gaa tat 982 
Ala His Pro Gin Val Asp Pro Lys Leu Leu Asn Lys Leu His Glu Tyr 
275 280 285 

ttt cat gag cea aat aag aag ttc ttc gag ett gtt ggc aga aca ttt 1030 
Phe His Glu Pro Asn Lys Lys Phe Phe Glu Leu Val Gly Arg Thr Phe 
290 295 300 

gac tgg cac tgatttgeaa taagctaage teagaaaett tectaetgta 1079 

Asp Trp His 

305 

agttctggtg tacatctgag gggaaaaaga attttaaaaa ageatttaag gtataattta 1139 
tttgtaaaat ccataaagta cttctgtaca gtattagatt cacaattgee atatatacta 1199 
gttatatttt tetacttgtt aaatggaggg cattttgtat tgt ttt teat ggttgttaac 1259 
attgtgtaat atgtctetat atgaaggaac taaactattt cactga 1305 



<210> 4 

<211> 307 

<212> PRT 

<213> Homo sapiens 

<400> 4 

Met Ala Ala Leu Leu Leu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
15 10 15 

Leu Val Pro Ser Arg Pro Ala Glu Leu Gly Gin Gin Glu Leu Leu Arg 
20 25 30 

Lys Ala Gly Thr Leu Gin Asp Asp Val Arg Asp Gly Val Ala Pro Asn 
35 40 45 

Gly Ser Ala Gin Gin Leu Pro Gin Thr lie lie lie Gly Val Arg Lys 
50 55 60 

Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu His Pro Asp Val 
65 70 75 80 

Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp Glu Glu His Tyr 
85 90 95 

Ser His Gly Leu Gly Trp Tyr Leu Ser Gin Met Pro Phe Ser Trp Pro 
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100 105 110 

His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys 
115 120 125 

Val Pro Glu Arg Val Tyr Ser Met Asn Pro Ser lie Arg Leu Leu Leu 
130 135 140 

lie Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp Tyr Thr Gin Val 
145 150 155 160 

Phe Tyr Asn His Met Gin Lys His Lys Pro Tyr Pro Ser lie Glu Glu 
165 170 175 

Phe Leu Val Arg Asp Gly Arg Leu Asn Val Asp Tyr Lys Ala Leu Asn 
180 185 190 

Arg Ser Leu Tyr His Val His Met Gin Asn Trp Leu Arg Phe Phe Pro 
195 200 205 

Leu Arg His lie His lie Val Asp Gly Asp Arg Leu He Arg Asp Pro 
210 215 220 

Phe Pro Glu He Gin Lys Val Glu Arg Phe Leu Lys Leu Ser Pro Gin 
225 230 235 240 

He Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys 
245 250 255 

Leu Arg Asp Ser Gly Arg Asp Arg Cys Leu His Glu Ser Lys Gly Arg 
260 265 270 

Ala His Pro Gin Val Asp Pro Lys Leu Leu Asn Lys Leu His Glu Tyr 
275 280 285 

Phe His Glu Pro Asn Lys Lys Phe Phe Glu Leu Val Gly Arg Thr Phe 
290 295 300 

Asp Trp His 
305 

<210> 5 

<211> 1951 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (73) . . (1173) 

<223> human 3-OST-2 

<400> 5 

cgcagggcca cagcagctca gccgccggtg ccccctcgga aaccatgacc cccggcgcgg 60 

gcccatggag cc atg gcc tat agg gtc ctg ggc cgc gcg ggg cca cct cag ill 
Met Ala Tyr Arg Val Leu Gly Arg Ala Gly Pro Pro Gin 
15 10 

ccg egg agg gcg cgc agg ctg etc ttc gcc ttc acg etc teg etc tec 159 
Pro Arg Arg Ala Arg Arg Leu Leu Phe Ala Phe Thr Leu Ser Leu Ser 
15 20 25 

tgc act tac ctg tgt tac age ttc ctg tgc tgc tgc gae gae ctg ggt 207 
Cys Thr Tyr Leu Cys Tyr Ser Phe Leu Cys Cys Cys Asp Asp Leu Gly 
30 35 40 45 

egg age cge etc etc gge gcg cet cgc tgc etc cgc ggc ecc age gcg 255 
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Arg Ser Arg Leu Leu Gly Ala Pro Arg Cys Leu Arg Gly Pro Ser Ala 
50 55 60 

Srgc ggc cag aaa ctt etc cag aag tec cgc ccc tgt gat ccc tee ggg 303 
Gly Gly Gin Lys Leu Leu Gin Lys Ser Arg Pro Cys Asp Pro Ser Gly 
65 70 75 

ccg acg ccc age gag ccc age get ccc age gcg ccc gee gee gcc gtg 351 

Pro Thr Pro Ser Glu Pro Ser Ala Pro Ser Ala Pro Ala Ala Ala Val 
80 85 90 

ccc gcc cct cgc etc tec ggt tec aac cac tec ggc tea ccc aag etg 3 99 

Pro Ala Pro Arg Leu Ser Gly Ser Asn His Ser Gly Ser Pro Lys Leu 

95 100 105 

ggt ace aag egg ttg ccc caa gee etc att gtg ggc gtg aag aag ggg 447 

Gly Thr Lys Arg Leu Pro Gin Ala Leu lie Val Gly Val Lys Lys Gly 
110 115 120 125 

ggc ace egg gee gtg etg gag ttt ate ega gta eae eeg gae gtg egg 495 

Gly Thr Arg Ala Val Leu Glu Phe lie Arg Val His Pro Asp Val Arg 
130 135 140 

gee ttg ggc aeg gaa eee eae tte ttt gae agg aac tac ggc cgc ggg 543 

Ala Leu Gly Thr Glu Pro His Phe Phe Asp Arg Asn Tyr Gly Arg Gly 
145 150 155 

etg gat tgg tac agg age etg atg ccc agg ace etc gag age cag ate 591 

Leu Asp Trp Tyr Arg Ser Leu Met Pro Arg Thr Leu Glu Ser Gin lie 
160 165 170 

acg etg gag aag acg ccc age tac ttt gtc act caa gag get cct ega 639 

Thr Leu Glu Lys Thr Pro Ser Tyr Phe Val Thr Gin Glu Ala Pro Arg 

175 180 185 

cgc ate tte aac atg tee ega gae ace aag etg ate gtg gtt gtg egg 687 

Arg lie Phe Asn Met Ser Arg Asp Thr Lys Leu lie Val Val Val Arg 
190 195 200 205 

aac eet gtg aee egt gee ate tet gat tac aeg eag aea etc tee aag 735 

Asn Pro Val Thr Arg Ala lie Ser Asp Tyr Thr Gin Thr Leu Ser Lys 
210 215 220 

aag eee gae ate ccg aee ttt gag ggc etc tec tte cgc aac cgc ace 783 

Lys Pro Asp lie Pro Thr Phe Glu Gly Leu Ser Phe Arg Asn Arg Thr 
225 230 235 

etg ggc etg gtg gae gtg teg tgg aac gee ate cgc ate ggc atg tac 831 

Leu Gly Leu Val Asp Val Ser Trp Asn Ala lie Arg lie Gly Met Tyr 
240 245 250 

gtg etg eae etg gag age tgg etg eag tac tte ccg eta get cag att 879 

Val Leu His Leu Glu Ser Trp Leu Gin Tyr Phe Pro Leu Ala Gin lie 

255 260 265 

eae tte gtc agt ggc gag ega etc ate act gae eeg gcc ggc gag atg 927 

His Phe Val Ser Gly Glu Arg Leu lie Thr Asp Pro Ala Gly Glu Met 
270 275 280 285 

999 ega gtc eag gae tte etg ggc att aag aga tte ate acg gae aag 975 

Gly Arg Val Gin Asp Phe Leu Gly lie Lys Arg Phe lie Thr Asp Lys 
290 295 300 

eae tte tat tte aac aag aee aaa gga tte cct tgc ttg aaa aaa aea 1023 

His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Thr 
305 310 315 

gaa teg age etc etg cct ega tgc ttg ggc aaa tea aaa ggg aga act 1071 
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Glu Ser Ser Leu Leu Pro Arg Cys Leu Gly Lys Ser Lys Gly Arg Thr 
320 325 330 

cat gta cag att gat cct gaa gtg ata gac cag etc cga gaa ttt tat X119 

His Val Gin lie Asp Pro Glu Val lie Asp Gin Leu Arg Glu Phe Tyr 

335 340 345 

aga ccg tat aat ate aaa ttt tat gaa acc gtt ggg cag gac ttc agg 116 7 

Arg Pro Tyr Asn lie Lys Phe Tyr Glu Thr Val Gly Gin Asp Phe Arg 
350 355 360 365 



tgg gaa taagcccacg aaaggaaagg gctctcaagg gctcttctgc tcatctcttc 
Trp Glu 


1223 


cgtgagattt 


gctcccagac 


cctcttatct 


ccctccaaca 


aaccctggct 


ccagccccct 


1283 


ttcccaactt 


gagttgcatc 


atcttggaac 


caggaagccc 


agctaaagcc 


aagagaccag 


1343 


agagtctctg 


ccactagttt 


tcatcagtct 


gttcaagcaa 


agttgatctg 


ctcctggcac 


1403 


gtccagtaaa 


ttccagaatc 


attctccttt 


ctgcccataa 


agggccttgg 


agaattgctt 


1463 


taagaagagt 


gaatgttcca 


atgatgatag 


atattataag 


cgacgatggt 


tctgttgcta 


1523 


tgaacacagc 


agtcggtccc 


tgtcattgtc 


cacccaggag 


tggccttgtt 


aattccaagt 


1583 


ggcatgtatc 


ttccctctga 


gcttcatttc 


ttcaagatgc 


tctgggtggt 


gggatgggag 


1643 


accatcctca 


gccctcctca 


gaccttatca 


attcattgag 


agattgcaaa 


gctgaaagca 


1703 


cctccggcca 


ctcctgggag 


acagaccctt 


tggtgatgaa 


ataaaccagt 


gacttcagag 


1763 


cctatggtct 


caactgtgct 


tgaaaaacac 


tgtctctgaa 


aacaactttg 


tgattctccc 


1823 


tgctccctgt 


ggacaaaagc 


acataattct 


gctgttacgg 


gtactttgct 


catacgagct 


1883 


ttcatgttca 


gcatgcaatg 


gaatcatgct 


tgtccatgtg 


aaataaatat 


ggctctctcg 


1943 


tgtcctta 












1951 


<210> 6 
<211> 367 
<212> PRT 
<213> Homo 


sapiens 













<400> 6 

Met Ala Tyr Arg Val Leu Gly Arg Ala Gly Pro Pro Gin Pro Arg Arg 
15 10 15 

Ala Arg Arg Leu Leu Phe Ala Phe Thr Leu Ser Leu Ser Cys Thr Tyr 
20 25 30 

Leu Cys Tyr Ser Phe Leu Cys Cys Cys Asp Asp Leu Gly Arg Ser Arg 
35 40 45 

Leu Leu Gly Ala Pro Arg Cys Leu Arg Gly Pro Ser Ala Gly Gly Gin 
50 55 60 

Lys Leu Leu Gin Lys Ser Arg Pro Cys Asp Pro Ser Gly Pro Thr Pro 
65 70 75 80 

Ser Glu Pro Ser Ala Pro Ser Ala Pro Ala Ala Ala Val Pro Ala Pro 
85 90 95 

Arg Leu Ser Gly Ser Asn His Ser Gly Ser Pro Lys Leu Gly Thr Lys 
100 105 110 



9922005A2 I > 



wo 99/22005 PCT/US98/22597 

9/25 

Arg Leu Pro Gin Ala Leu lie Val Gly Val Lys Lys Gly Gly Thr Arg 
115 120 125 

Ala Val Leu Glu Phe lie Arg Val His Pro Asp Val Arg Ala Leu Gly 
130 135 140 

Thr Glu Pro His Phe Phe Asp Arg Asn Tyr Gly Arg Gly Leu Asp Trp 
145 150 155 160 

Tyr Arg Ser Leu Met Pro Arg Thr Leu Glu Ser Gin lie Thr Leu Glu 
165 170 175 

Lys Thr Pro Ser Tyr Phe Val Thr Gin Glu Ala Pro Arg Arg lie Phe 
180 185 190 

Asn Met Ser Arg Asp Thr Lys Leu lie Val Val Val Arg Asn Pro Val 
195 200 205 

Thr Arg Ala lie Ser Asp Tyr Thr Gin Thr Leu Ser Lys Lys Pro Asp 
210 215 220 

lie Pro Thr Phe Glu Gly Leu Ser Phe Arg Asn Arg Thr Leu Gly Leu 
225 230 235 240 

Val Asp Val Ser Trp Asn Ala lie Arg lie Gly Met Tyr Val Leu His 
245 250 255 

Leu Glu Ser Trp Leu Gin Tyr Phe Pro Leu Ala Gin lie His Phe Val 
260 265 270 

Ser Gly Glu Arg Leu lie Thr Asp Pro Ala Gly Glu Met Gly Arg Val 
275 280 285 

Gin Asp Phe Leu Gly lie Lys Arg Phe lie Thr Asp Lys His Phe Tyr 
290 295 300 

Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Thr Glu Ser Ser 
305 310 315 320 

Leu Leu Pro Arg Cys l»eu Gly Lys Ser Lys Gly Arg Thr His Val Gin 
325 330 335 

lie Asp Pro Glu Val lie Asp Gin Leu Arg Glu Phe Tyr Arg Pro Tyr 
340 345 350 

Asn lie Lys Phe Tyr Glu Thr Val Gly Gin Asp Phe Arg Trp Glu 
355 360 365 



<210> 7 

<211> 2314 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (799) . . (2016) 
<223> human 3-OST-3A 

<400> 7 

cagcggcggc ccaggaggca gccggtgagc gcctgcgagc agagtggcgg gggccgctga 60 
caggtcccgc gcagcccagc ccagcccagc cacgcggctc acaggtgggg tccaagagca 120 
gtttggagca acccggcgct acggagaggg gtggacggct ctgcacgggc ctcctgtctc 180 
ccgctcgggc agagggactc ggggggacct cgctccttgg ccgagagaac ctgaactcgg 240 
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gcggagagaa 


cgcgcccagg 


cgggcaaggg 


gaccagagaa 


agccggggct 


ggaagtcaet 


300 


gtcgctcgcc 


actgtctgga 


gcgcacggag 


cgcagaggcc 


cggcagccgc 


gegtgcccte 


360 


ccggggaccg 


agccagtgat 


gcaggatcgc 


tgagcggaga 


tccgcgccga 


gaagtctctc 


420 


ggggccgggg 


ctgagacgca 


cgccttcgac 


accgctgcca 


agaccccgat 


tccggcgact 


480 


cttgcgggga 


accgaggggc 


caaggctgcc 


ccaagctcag 


gacttgggcg 


agtetaagac 


540 


gatggtttct 


taagcacgga 


cccgcgttcc 


ccttcccgcc 


ccctcgactg 


gaggcaggga 


600 


tcctgcgcgg 


ggcccccggg 


attccgtttc 


cccgcggagc 


cccggccgct 


gccteceggg 


660 


acagttcgca 


cggccacagg 


ggcgcacggc 


gatgtggcct 


ccgtccagcg 


cgctggceeg 


720 


ccggggggat 


gctctggcac 


ctgtcggggt 


ccaggcctag 


catggccggc 


gegttgeecg 


780 


acgtcgcctc 


cggctagg atg gcc cct ccg ggc ccg 
Met Ala Pro Pro Gly Pro 
1 5 


gcc agt gcc etc tec 
Ala Ser Ala Leu Ser 
10 


831 



ace teg gcc gag ccg ctg tec cgc age ate tte egg aag ttc ttg ctg 879 
Thr Ser Ala Glu Pro Leu Ser Arg Ser He Phe Arg Lys Phe Leu Leu 
15 20 25 

atg etc tgc tec ctg etc aeg tec ett tac gtc ttc tac tgc ctg gcc 927 
Met Leu Cys Ser Leu Leu Thr Ser Leu Tyr Val Phe Tyr Cys Leu Ala 
30 35 40 

gag cgc tgc eag ace ctg tec ggc cec gtc gtg ggg ctg tec ggc ggc 975 
Glu Arg Cys Gin Thr Leu Ser Gly Pro Val Val Gly Leu Ser Gly Gly 
45 50 55 

ggc gag gag geg ggg gcc cct ggt ggc ggc gtc ctg gee gga ggc ccg 1023 
Gly Glu Glu Ala Gly Ala Pro Gly Gly Gly Val Leu Ala Gly Gly Pro 
60 65 70 75 

agg gag ctg geg gtg tgg ccg geg geg gea eag aga aag cgc etc ctg 1071 
Arg Glu Leu Ala Val Trp Pro Ala Ala Ala Gin Arg Lys Arg Leu Leu 

80 85 90 

caa ctg ccg cag tgg egg agg cgc egg ccg cec geg cec cgc gac gac 1119 
Gin Leu Pro Gin Trp Arg Arg Arg Arg Pro Pro Ala Pro Arg Asp Asp 
95 100 105 

ggc gag gag geg gcc tgg gaa gaa gag tec cct ggc ctg tea ggg ggt 1167 
Gly Glu Glu Ala Ala Trp Glu Glu Glu Ser Pro Gly Leu Ser Gly Gly 
110 115 120 

ccg ggc ggc tee ggg gee gga age ace gtg gee gag gcc ccg ccg ggg 1215 
Pro Gly Gly Ser Gly Ala Gly Ser Thr Val Ala Glu Ala Pro Pro Gly 
125 130 135 

ace ctg geg ctg etc ctg gac gaa ggc age aag eag ctg ccg eag gee 1263 
Thr Leu Ala Leu Leu Leu Asp Glu Gly Ser Lys Gin Leu Pro Gin Ala 
140 145 150 155 

ate ate ate gga gtg aag aag ggc ggc aeg egg geg ctg ctg gag ttc 1311 
He He He Gly Val Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe 
160 165 170 

ctg cgc gtg cac cec gac gtg cgc gee gtg ggc gcc gag cce cac tte 1359 
Leu Arg Val His Pro Asp Val Arg Ala Val Gly Ala Glu Pro His Phe 
175 180 185 

ttc gac cgc age tac gac aag ggc etc gee tgg tac egg gac ctg atg 1407 
Phe Asp Arg Ser Tyr Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met 
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190 195 200 

ccc aga acc ctg gac ggg cag ate acc atg gag aag acg ccc agt tac 1455 
Pro Arg Thr Leu Asp Gly Gin lie Thr Met Glu Lys Thr Pro Ser Tyr 
205 210 215 

ttc gtc acg egg gag gcc ccc gcg cgc ate teg gcc atg tec aag gac 1503 
Phe Val Thr Arg Glu Ala Pro Ala Arg He Ser Ala Met Ser Lys Asp 
220 225 230 235 

acc aag etc ate gtg gtg gtg egg gac ccg gtg acc agg gcc ate teg 1551 
Thr Lys Leu He Val Val Val Arg Asp Pro Val Thr Arg Ala He Ser 
240 245 250 

gac tac acg cag acg ctg tec aag egg ccc gac ate ccc acc ttc gag 1599 
Asp Tyr Thr Gin Thr Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu 
255 260 265 

age ttg acg ttc aaa aac agg aca gcg gge etc ate gac acg teg tgg 1647 
Ser Leu Thr Phe Lys Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp 
270 275 280 

age gcc ate eag ate gge ate tac gee aag cae ctg gag cac tgg ctg 1695 
Ser Ala He Gin He Gly He Tyr Ala Lys His Leu Glu His Trp Leu 
285 290 295 

cgc cac ttc ccc ate cgc cag atg etc ttc gtg age gge gag egg etc 174 3 
Arg His Phe Pro He Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu 
300 305 310 315 

ate age gac ccg gcc ggg gag ctg gge cgc gtg caa gac ttc ctg gge 1791 
He Ser Asp Pro Ala Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly 
320 325 330 

etc aag agg ate ate acg gac aag cac ttc tac ttc aac aag acc aag 1839 
Leu Lys Arg He He Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys 
335 340 345 

gge ttc ccc tge ctg aag aag gcg gag gge age age egg eec cat tgc 1887 
Gly Phe Pro Cys Leu Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys 
350 355 360 

ctg gge aag acc aag gge agg acc cat cct gag ate gac cgc gag gtg 1935 
Leu Gly Lys Thr Lys Gly Arg Thr His Pro Glu He Asp Arg Glu Val 
365 370 375 

gtg cgc agg ctg cgc gag ttc tac egg cct ttc aac etc aag ttc tac 1983 
Val Arg Arg Leu Arg Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr 
380 385 390 395 

eag atg ace ggg cac gac ttt gge tgg gat gga taaccatata atttaaaaag 2036 
Gin Met Thr Gly His Asp Phe Gly Trp Asp Gly 
400 405 

aaaaaaaaaa tcaaaatata atatattttt ttaccaateg gtagagaaga gacagtttaa 2096 

tatttgtgct gaaaatatgt ttcagtattt ttttcaatga atgttaagag attgttetea 2156 

ctcccgcccc atettaatgt ataaccaaea ceaaacacgt ggateaacag aaaaggaaaa 2216 

tttcactcgt etaaacaett teaattttea gtttttattt tatgttctat ataceeagtc 2276 

ataaagtata ageatcagtt gtcattaaaa gtttteag 2314 

<210> 8 
<211> 406 
<212> PRT 
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<213> Homo sapiens 
<400> 8 

Met Ala Pro Pro Gly Pro Ala Ser Ala Leu Ser Thr Ser Ala Glu Pro 
15 10 15 

Leu Ser Arg Ser lie Phe Arg Lys Phe Leu Leu Met Leu Cys Ser Leu 
20 25 30 

Leu Thr Ser Leu Tyr Val Phe Tyr Cys Leu Ala Glu Arg Cys Gin Thr 
35 40 45 

Leu Ser Gly Pro Val Val Gly Leu Ser Gly Gly Gly Glu Glu Ala Gly 
50 55 60 

Ala Pro Gly Gly Gly Val Leu Ala Gly Gly Pro Arg Glu Leu Ala Val 
65 70 75 80 

Trp Pro Ala Ala Ala Gin Arg Lys Arg Leu Leu Gin Leu Pro Gin Trp 
85 90 95 

Arg Arg Arg Arg Pro Pro Ala Pro Arg Asp Asp Gly Glu Glu Ala Ala 
100 105 110 

Trp Glu Glu Glu Ser Pro Gly Leu Ser Gly Gly Pro Gly Gly Ser Gly 
115 120 125 

Ala Gly Ser Thr Val Ala Glu Ala Pro Pro Gly Thr Leu Ala Leu Leu 
130 135 140 

Leu Asp Glu Gly Ser Lys Gin Leu Pro Gin Ala lie He He Gly Val 
145 150 155 160 

Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe Leu Arg Val His Pro 
165 170 175 

Asp Val Arg Ala Val Gly Ala Glu Pro His Phe Phe Asp Arg Ser Tyr 
180 185 190 

Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met Pro Arg Thr Leu Asp 
195 200 205 

Gly Gin He Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Arg Glu 
210 215 220 

Ala Pro Ala Arg He Ser Ala Met Ser Lys Asp Thr Lys Leu He Val 
225 230 235 240 

Val Val Arg Asp Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr 
245 250 255 

Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu Ser Leu Thr Phe Lys 
260 265 270 

Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp Ser Ala He Gin He 
275 280 285 

Gly He Tyr Ala Lys His Leu Glu His Trp Leu Arg His Phe Pro He 
290 295 300 

Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu He Ser Asp Pro Ala 
305 310 315 320 

Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly Leu Lys Arg He He 

325 330 335 

Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu 
340 345 350 
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Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys Leu Gly Lys Thr Lys 
355 360 365 

Gly Arg Thr His Pro Glu lie Asp Arg Glu Val Val Arg Arg Leu Arg 
370 375 380 

Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr Gin Met Thr Gly His 
385 390 395 400 

Asp Phe Gly Trp Asp Gly 
405 

<210> 9 

<211> 2032 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (331) . . (1500) 
<223> human 3-OST-3B 

<400> 9 

gtggccaggg cgcgagagtg caacgtcctc ctggccccga gcgcgtcgtc gcgccccggg 60 

agcagaccct cgcccagcag ttaccgccgt cccgactttc cgttccagtt gcagctcctg 120 

ccgggcaaca tgtcaagagc cgccgccgct acagctgccg ccgccacctg gggaagagca 180 

gcagcagcag cggcggccgc gggcacacgg gggcaataaa ccgagccacc cgggcgtcca 240 

gcgtgccggg gaaccctctc tgcgctcact gcccggcggg acccacgcca tgtgctgagc 300 

catgtccctg gccgcgcccg cgggcagcgc atg ggg cag cgc ctg agt ggc ggc 354 

Met Gly Gin Arg Leu Ser Gly Gly 
1 5 

aga tct tgc etc gat gtc ccc ggc egg etc eta ccg cag ccg eeg ceg 402 
Arg Ser Cys Leu Asp Val Pro Gly Arg Leu Leu Pro Gin Pro Pro Pro 
10 15 20 

ccc ccg ccg ccg gtg agg agg aag etc gcg ctg etc ttc gee atg etc 450 
Pro Pro Pro Pro Val Arg Arg Lys Leu Ala Leu Leu Phe Ala Met Leu 
25 30 35 40 

tgc gtc tgg etc tat atg ttc ctg tac teg tgc gee ggc tec tgc gee 498 
Cys Val Trp Leu Tyr Met Phe Leu Tyr Ser Cys Ala Gly Ser Cys Ala 
45 50 55 

gee gcg ceg ggg ctg ctg etc ctg ggc tct ggg tec cgc gee gea cac 546 
Ala Ala Pro Gly Leu Leu Leu Leu Gly Ser Gly Ser Arg Ala Ala His 
60 65 70 

gae eeg eca gee ctg gee aca get ccg gae ggg acg ccc ccc agg ctg 594 
Asp Pro Pro Ala Leu Ala Thr Ala Pro Asp Gly Thr Pro Pro Arg Leu 
75 80 85 

•ccg ttc egg gcg ccg eca gee ace eca ctg get tea ggc aag gag atg 642 
Pro Phe Arg Ala Pro Pro Ala Thr Pro Leu Ala Ser Gly Lys Glu Met 
90 95 100 

gee gag ggc get gcg age ccg gag gag cag agt ccc gag gtg ceg gae 690 
Ala Glu Gly Ala Ala Ser Pro Glu Glu Gin Ser Pro Glu Val Pro Asp 
105 110 115 120 

tec eca age ccc ate tec age ttt ttc agt ggg tct ggg age aag cag 738 
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Ser Pro Ser Pro He Ser Ser Phe Phe Ser Gly Ser Gly Ser Lys Gin 
125 130 135 

ctg ccg cag gcc ate ate ate ggc gtg aag aag ggc ggc acg egg gcg 786 
Leu Pro Gin Ala He He He Gly Val Lys Lys Gly Gly Thr Arg Ala 
140 145 150 

ctg ctg gag ttt ctg cgc gtg cac ccc gac gtg cgc gee gtg ggc gcc 834 
Leu Leu Glu Phe Leu Arg Val His Pro Asp Val Arg Ala Val Gly Ala 
155 160 165 

gag ccc eat ttc tte gat cgc age tac gac aag ggc etc get tgg tac 882 
Glu Pro His Phe Phe Asp Arg Ser Tyr Asp Lys Gly Leu Ala Trp Tyr 
170 175 180 

egg gac ctg atg ccc aga ace ctg gac ggg cag ate ace atg gag aag 930 
Arg Asp Leu Met Pro Arg Thr Leu Asp Gly Gin He Thr Met Glu Lys 
185 190 195 200 

acg ccc agt tac ttc gtc acg egg gag gcc ccc gcg cgc ate teg gcc 978 
Thr Pro Ser Tyr Phe Val Thr Arg Glu Ala Pro Ala Arg He Ser Ala 
205 210 215 

atg tee aag gac acc aag etc ate gtg gtg gtg egg gac ccg gtg ace 1026 
Met Ser Lys Asp Thr Lys Leu He Val Val Val Arg Asp Pro Val Thr 
220 225 230 

agg gee ate teg gac tac acg cag acg ctg tec aag egg ccc gac ate 1074 
Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys Arg Pro Asp He 
235 240 245 

ccc ace tte gag age ttg acg ttc aaa aac agg aea gcg ggc etc ate 1122 
Pro Thr Phe Glu Ser Leu Thr Phe Lys Asn Arg Thr Ala Gly Leu He 
250 255 260 

gac acg teg tgg age gcc ate cag ate ggc ate tac gcc aag cac ctg 1170 
Asp Thr Ser Trp Ser Ala He Gin He Gly He Tyr Ala Lys His Leu 
265 270 275 280 

gag cac tgg ctg cgc cac ttc ccc ate cgc cag atg etc ttc gtg age 1218 
Glu His Trp Leu Arg His Phe Pro He Arg Gin Met Leu Phe Val Ser 
285 290 295 

gge gag egg etc ate age gac ccg gee ggg gag ctg ggc cgc gtg caa 1266 
Gly Glu Arg Leu He Ser Asp Pro Ala Gly Glu Leu Gly Arg Val Gin 
300 305 310 

gac ttc ctg ggc etc aag agg ate ate acg gac aag cac ttc tac ttc 1314 
Asp Phe Leu Gly Leu Lys Arg He He Thr Asp Lys His Phe Tyr Phe 
315 320 325 

aac aag acc aag ggc ttc ccc tgc ctg aag aag gcg gag ggc age age 1362 
Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Ala Glu Gly Ser Ser 
330 335 340 

egg ccc cat tgc ctg ggc aag acc aag ggc agg acc cat cet gag ate 1410 
Arg Pro His Cys Leu Gly Lys Thr Lys Gly Arg Thr His Pro Glu He 
345 350 355 360 

gac cgc gag gtg gtg cgc agg ctg cgc gag tte tac egg cet ttc aac 1458 
Asp Arg Glu Val Val Arg Arg Leu Arg Glu Phe Tyr Arg Pro Phe Asn 
365 370 375 

etc aag ttc tac cag atg acc ggg cac gac ttt ggc tgg gat 1500 
Leu Lys Phe Tyr Gin Met Thr Gly His Asp Phe Gly Trp Asp 
380 385 390 

tgagcagacc cgggctatgt accttaccca cgtggcttat ctattgacag agattatatg 1560 
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tatgtaaaat gtacagaaat ctattttata ataatttatt tttaattcat aagcaattaa 1620 

ttcactaagc tgcctagcca cactctttag agagttagct tcataatctg ttaacattcc 1680 

aaagtgttta actctagtat ttcgttttct tcttcacaat tgatggtgct tctatttttt 1740 

cttctcccct acctgttata tttaaaacaa agaaaagcac aacttgagat ttttgttgtt 1800 

acgggtattc agccttcagt caccgtctga gttctccagt tgctgcctcc ttgtcttgtc 1860 

ttgggtctcc cattccagct tccctgtctc ttcctgcctg tgtacctcgt aggaacgctg 1920 

agctgcctca acagggctgt attctgaagg gcaggcctca tgcagcagcc tccttgcaga 1980 

tgtggtgtcc cgtccaatga tgtagcctga aagccacagc cctagggttc tg 2032 

<210> 10 

<211> 390 

<212> PRT 

<213> Homo sapiens 

<400> 10 

Met Gly Gin Arg Leu Ser Gly Gly Arg Ser Cys Leu Asp Val Pro Gly 
15 10 15 

Arg Leu Leu Pro Gin Pro Pro Pro Pro Pro Pro Pro Val Arg Arg Lys 
20 25 30 

Leu Ala Leu Leu Phe Ala Met Leu Cys Val Trp Leu Tyr Met Phe Leu 
35 40 45 

Tyr Ser Cys Ala Gly Ser Cys Ala Ala Ala Pro Gly Leu Leu Leu Leu 
50 55 60 

Gly Ser Gly Ser Arg Ala Ala His Asp Pro Pro Ala Leu Ala Thr Ala 
65 70 75 80 

Pro Asp Gly Thr Pro Pro Arg Leu Pro Phe Arg Ala Pro Pro Ala Thr 
85 90 95 

Pro Leu Ala Ser Gly Lys Glu Met Ala Glu Gly Ala Ala Ser Pro Glu 
100 105 110 

Glu Gin Ser Pro Glu Val Pro Asp Ser Pro Ser Pro lie Ser Ser Phe 
115 120 125 

Phe Ser Gly Ser Gly Ser Lys Gin Leu Pro Gin Ala lie lie lie Gly 
130 135 140 

Val Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe Leu Arg Val His 
145 150 155 160 

Pro Asp Val Arg Ala Val Gly Ala Glu Pro His Phe Phe Asp Arg Ser 
165 170 175 

Tyr Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met Pro Arg Thr Leu 
180 185 190 

Asp Gly Gin lie Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Arg 
195 200 205 

Glu Ala Pro Ala Arg lie Ser Ala Met Ser Lys Asp Thr Lys Leu lie 
210 215 220 

Val Val Val Arg Asp Pro Val Thr Arg Ala lie Ser Asp Tyr Thr Gin 
225 230 235 240 
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Thr Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu Ser Leu Thr Phe 
245 250 255 

Lys Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp Ser Ala He Gin 
260 265 270 

He Gly He Tyr Ala Lys His Leu Glu His Trp Leu Arg His Phe Pro 
275 280 285 

He Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu He Ser Asp Pro 

290 295 300 

Ala Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly Leu Lys Arg He 
305 3X0 315 320 

He Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys 
325 330 335 

Leu Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys Leu Gly Lys Thr 
340 345 350 

Lys Gly Arg Thr His Pro Glu He Asp Arg Glu Val Val Arg Arg Leu 
355 360 365 

Arg Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr Gin Met Thr Gly 
370 375 380 

His Asp Phe Gly Trp Asp 
385 390 



<210> 11 

<211> 3658 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (847) . . (2214) 
<220> 

<223> Predicted human 3-OST-4 hnRNA 
<400> 11 



gaggatatcc 


cgggcgagag 


aagggagggt 


cggggatggg 


ctgagttgga 


gtcccagagg 


60 


aaaagcggaa 


gcgagagctt 


cgtcacccgc 


tgtcttccag 


ctcccggtgc 


gcggcaccgg 


120 


aggcaggcgt 


tgggctttac 


ctctctaaaa 


gtactggggc 


aaaggaatgg 


agaacacggc 


180 


gtcccgagct 


cccaagggag 


gggagtaaac 


gaggtggggt 


ggggaacacc 


ccaagtgcgt 


240 


gcgtgctggg 


gggctggggg 


gcacgatctc 


cgttctcccg 


ggtgccccag 


ccctagcgca 


300 


cgcctccgct 


cccccgcccc 


cttcgcaggc 


gcgcgcgagg 


cgcacccccc 


ttccctcggc 


360 


ggcgccgggc 


gcgcgcccgg 


ccccctcctc 


ctcccctccg 


cgcctctcct 


ctctcccggc 


420 


agaaagttag 


cagcggggaa 


ggaactctgg 


gctgcaacag 


cgcgcggcgg 


cggcggcaga 


480 


ggctgaagca 


gaagccgcgg 


cggagccggg 


gaagcggggg 


cgctgcagac 


ggagcaggtg 


540 


ccgccggcgg 


gtccgcgcgc 


ccccctcggt 


ccccttgcct 


gaggctgagg 


ggggggcggt 


600 


ggtggggggg 


ccactcggac 


tcggcgggca 


gcgtggggcg 


srggggccatg 


cggccgggct 


660 


cccccctggc 


gcagcgggac 


agcggccagg 


gccgggggcg 


cagcggcgtc 


gcttcatgca 


720 
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gccggggcgg ctgggcagcg gcggcggcgg cggcggcggc ggcggcgggg gcggcggctg 780 

aaaccatgtc cgggcagcgc cgggggctgc cgccgccgcc gccgccgccg cgagccggga 840 

gccgcg atg gcc egg tgg ccc gca cct cct ccg cct ccg cct ccg cct 888 
Met Ala Arg Trp Pro Ala Pro Pro Pro Pro Pro Pro Pro Pro 
15 10 

cca cct ctg gcc gcg ccg ccg ccg ccc ggc gcc tct get aag ggg ccg 936 
Pro Pro Leu Ala Ala Pro Pro Pro Pro Gly Ala Ser Ala Lys Gly Pro 
15 20 25 30 

ccg gcg cgc aag ctg ctt ttt atg tgc acc ttg tec ctg tct gtc ace 984 
Pro Ala Arg Lys Leu Leu Phe Met Cys Thr Leu Ser Leu Ser Val Thr 
35 40 45 

tac ctg tgc tac age etc ctg ggc ggc teg ggc tec ctg caa tte cct 1032 
Tyr Leu Cys Tyr Ser Leu Leu Gly Gly Ser Gly Ser Leu Gin Phe Pro 
50 55 60 

ctg gcg ctg cag gag teg ccg ggc gcc gcc gcc gag ccc ccg ccg age 1080 
Leu Ala Leu Gin Glu Ser Pro Gly Ala Ala Ala Glu Pro Pro Pro Ser 
65 70 75 

ccg ccg cca ccc tct ctg ctg cct ace ccc gtg cgc etc ggc gee ccc 1128 
Pro Pro Pro Pro Ser Leu Leu Pro Thr Pro Val Arg Leu Gly Ala Pro 
80 85 90 

teg cag ccg ccc gcg ccg ccg ccg ctg gac aac gcg age cac ggg gag 1176 
Ser Gin Pro Pro Ala Pro Pro Pro Leu Asp Asn Ala Ser His Gly Glu 
95 100 105 110 

ccg ccc gag ccc cca gag cag cca gcc gcc ccc ggg acc gac ggc tgg 1224 
Pro Pro Glu Pro Pro Glu Gin Pro Ala Ala Pro Gly Thr Asp Gly Trp 
115 120 125 

ggg ctg ccg age ggc ggc gga ggc gcc egg gac gcc tgg etc egg acc 1272 
Gly Leu Pro Ser Gly Gly Gly Gly Ala Arg Asp Ala Trp Leu Arg Thr 
130 135 140 

ccg ctg gcc ccc age gag atg ate aeg get cag age gcg ctg ccg gag 1320 
Pro Leu Ala Pro Ser Glu Met lie Thr Ala Gin Ser Ala Leu Pro Glu 
145 150 155 

agg gaa gcg cag gag tec age acc acc gac gag gat etc gca ggc egg 1368 
Arg Glu Ala Gin Glu Ser Ser Thr Thr Asp Glu Asp Leu Ala Gly Arg 
160 165 170 

aga gcg gcc aac ggg age age gag agg ggc ggc gcc gtc age acc ccc 1416 
Arg Ala Ala Asn Gly Ser Ser Glu Arg Gly Gly Ala Val Ser Thr Pro 
175 180 185 190 

gac tat ggg gag aag aag ctg cca cag gcg etc ate ate ggg gtc aag 1464 
Asp Tyr Gly Glu Lys Lys Leu Pro Gin Ala Leu He He Gly Val Lys 
195 200 205 

aaa gga ggg ace cgc gcg ctg ctg gag gcg ate cgc gtg cac ccg gac 1512 
Lys Gly Gly Thr Arg Ala Leu Leu Glu Ala He Arg Val His Pro Asp 
210 215 220 

gtg egg gcg gtg ggc gta gag ccg cac tte tte gac agg aac tac gaa 1560 
Val Arg Ala Val Gly Val Glu Pro His Phe Phe Asp Arg Asn Tyr Glu 
225 230 235 

asig ggg ttg gag tgg tac aga aat gtg atg ccc aag act ttg gat ggg 1608 
Lys Gly Leu Glu Trp Tyr Arg Asn Val Met Pro Lys Thr Leu Asp Gly 
240 245 250 
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caa ata acc atg gag aag act cca agt tac ttt gtg aca aat gag get 
Gin lie Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Asn Glu Ala 
255 260 265 270 



1656 



1704 



ccc aag cgc att cac tec atg gcc aag gac ate aaa ctg att gtg gtg 
Pro Lys Arg lie His Ser Met Ala Lys Asp He Lys Leu He Val Val 
275 280 285 

gtg aga aac ccc gtg acc agg gcc ate tct gac tac acg cag aca ctg 
Val Arg Asn Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr Leu 
290 295 300 

tea aag aaa ccc gag ate ccc acc ttt gag gtg ctg gee ttc aaa aac 
Ser Lys Lys Pro Glu He Pro Thr Phe Glu Val Leu Ala Phe Lys Asn 
305 310 315 

egg acc etc ggg ctg ate gat get tec tgg agt gcc att ega ata ggg 1848 
Arg Thr Leu Gly Leu He Asp Ala Ser Trp Ser Ala He Arg He Gly 

ion -y^c ^ 



1752 



1800 



1896 



1944 



2040 



2088 



320 325 330 

ate tat gcg ctg cat ctg gaa aac tgg etc cag tat ttc ccc etc tec 
He Tyr Ala Leu His Leu Glu Asn Trp Leu Gin Tyr Phe Pro Leu Ser 
335 340 345 350 

cag ate etc ttt gtc agt ggt gag ega etc att gtg gac ccc gcc ggg 
Gin He Leu Phe Val Ser Gly Glu Arg Leu He Val Asp Pro Ala Gly 
355 360 365 

gaa atg gcc aaa gta cag gat ttt eta ggc etc aaa cgt gtt gtg act 1992 
Glu Met Ala Lys Val Gin Asp Phe Leu Gly Leu Lys Arg Val Val Thr 
370 375 380 

aag aag cat ttc tat ttc aac aaa ace aag ggg ttc cet tgc eta aag 
Lys Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys 
385 390 395 

aag cca gaa gac age agt gcc ecg agg tgc tta ggc aag age aaa ggt 
Lys Pro Glu Asp Ser Ser Ala Pro Arg Cys Leu Gly Lys Ser Lys Gly 
400 405 410 

egg act cat cct cgc att gac cca gat gtc ate cac aga ctg agg aaa 2136 
Arg Thr His Pro Arg He Asp Pro Asp Val He His Arg Leu Arg Lys 
415 420 425 430 

ttc tac aaa ccc ttc aac ttg atg ttt tac caa atg act ggt caa gat 2184 
Phe Tyr Lys Pro Phe Asn Leu Met Phe Tyr Gin Met Thr Gly Gin Asp 
435 440 445 

ttt cag tgg gaa cag gaa gag ggt gat aaa tgaggctaga gaggcagagg 2234 
Phe Gin Trp Glu Gin Glu Glu Gly Asp Lys 
450 455 

aaggctagtc aataagetaa ggaggctcet tgectgagtc cttgaatacc ccagcttctg 2294 

cagcttcact tgctggagtg ccaagtagat etcctcctcc ttcatgcagc caggattgcc 2354 

tccagtgctg ttagcttagg caaacaggtg gatcccatgg catccccatg gaggaaccag 2414 

gcccatctgg gcagcagcat ctggttgacc agatggccac cagaacccac tgttcattct 2474 

tatcttctgc tagttaatat agcctgaaga cagaggataa atagttgtca atgtcagaga 2534 

cagtgctatt aatgtatatg tgagcgacaa aaaaggtctg ctttataggg gttctcactc 2594 

tagcttgggg agcccagggt tctagccetg tatctgtcat gggcacctgc tgtctaaacc 2654 

tctgcttggg cttctcccca gaatgcactt tgtggctgag tgctccagga ctcctaggga 2714 
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gcaagctcct 


ccctctaagg 


tgtttctagt 


cttctcttta 


aaggtctcat 


cccacaaccc 


2774 


ctgacttcct 


ccctccccac 


atcatigaagg 


cagaggcatg 


cacattcctc 


actgaaaaag 


2834 


aaaacacaca 


cccacccaca 


cacacacaca 


cagaagaaaa 


tgaaagctga 


cacacctcga 


2894 


agccttcttt 


ccaagagccc 


tctaaatggg 


gttgggtctc 


actcttcatg 


agtatcctgg 


2954 


gttgtgcaga 


agcttagcat 


atgcccttgt 


gttcggatca 


ggcccacagg 


gctgctcaaa 


3014 


gagtagagta 


attgtaaccg 


aggtcagagc 


tctggggttg 


gcagagatga 


gtggccatat 


3074 


ctgggggtaa 


aagaagaaat 


cctgtcctct 


tggtgggagg 


ttaccttacc 


tgaagaccat 


3134 


ctctcccaag 


cactgtagtt 


ctgagcatgt 


ttttggggtg 


gactctgtcc 


cctagggtcc 


3194 


ctagaagggc 


aaagaccaga 


gagttgacaa 


gtctgttatt 


aggaataatc 


cttagccatg 


3254 


taatggagaa 


aggagcagtc 


agcattcttc 


caatttgccc 


caccaccacc 


tcctcgggct 


3314 


tcattttctc 


tatttagaga 


tggcagagag 


tgaggtagtg 


gcgagaaagc 


tgactccatt 


3374 


catcagatcc 


agtttatgag 


ggttgggggt 


gagcaagggc 


tgtctgcaga 


aacccccatc 


3434 


aagagctgcti 


gaatgaagtg 


tcccttccca 


tcagtttgat 


tcaattaaaa 


tgcatcattt 


3494 


gacataaagc 


acttgttcac 


agatctccaa 


aaccaggaat 


tgttctagta 


aaactggaaa 


3554 


tttgtatgag 


tggggggagt 


taaatctgtt 


cage tgt tat 


taaactgtca 


tttctcccgc 


3614 


taaatgaaaa 


ccgtgttgtt 


ataaagctta 


atgcaacctg 


atta 




3658 



<210> 12 
<211> 456 
<212> PRT 

<213> Homo sapiens 
<400> 12 

Met Ala Arg Trp Pro Ala Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro 
15 10 15 

Leu Ala Ala Pro Pro Pro Pro Gly Ala Ser Ala Lys Gly Pro Pro Ala 
20 25 30 

Arg Lys Leu Leu Phe Met Cys Thr Leu Ser Leu Ser Val Thr Tyr Leu 
35 40 45 

Cys Tyr Ser Leu Leu Gly Gly Ser Gly Ser Leu Gin Phe Pro Leu Ala 
50 55 60 

Leu Gin Glu Ser Pro Gly Ala Ala Ala Glu Pro Pro Pro Ser Pro Pro 
65 70 75 80 

Pro Pro Ser Leu Leu Pro Thr Pro Val Arg Leu Gly Ala Pro Ser Gin 
85 90 95 

Pro Pro Ala Pro Pro Pro Leu Asp Asn Ala Ser His Gly Glu Pro Pro 
100 105 110 

Glu Pro Pro Glu Gin Pro Ala Ala Pro Gly Thr Asp Gly Trp Gly Leu 
115 120 125 

Pro Ser Gly Gly Gly Gly Ala Arg Asp Ala Trp Leu Arg Thr Pro Leu 
130 135 140 

Ala Pro Ser Glu Met lie Thr Ala Gin Ser Ala Leu Pro Glu Arg Glu 
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145 



150 



155 



160 



Ala Gin Glu Ser Ser Thr Thr Asp Glu Asp Leu Ala Gly Arg Arg Ala 
165 170 175 

Ala Asn Gly Ser Ser Glu Arg Gly Gly Ala Val Ser Thr Pro Asp Tyr 
180 185 190 

Gly Glu Lys Lys Leu Pro Gin Ala Leu He He Gly Val Lys Lys Gly 
195 200 205 

Gly Thr Arg Ala Leu Leu Glu Ala He Arg Val His Pro Asp Val Arg 
210 215 220 

Ala Val Gly Val Glu Pro His Phe Phe Asp Arg Asn Tyr Glu Lys Gly 
225 230 235 240 

Leu Glu Trp Tyr Arg Asn Val Met Pro Lys Thr Leu Asp Gly Gin He 
245 250 255 

Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Asn Glu Ala Pro Lys 
260 265 270 

Arg He His Ser Met Ala Lys Asp He Lys Leu He Val Val Val Arg 
275 280 285 

Asn Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys 
290 295 300 

Lys Pro Glu He Pro Thr Phe Glu Val Leu Ala Phe Lys Asn Arg Thr 
305 310 315 320 

Leu Gly Leu He Asp Ala Ser Trp Ser Ala He Arg He Gly He Tyr 
325 330 335 

Ala Leu His Leu Glu Asn Trp Leu Gin Tyr Phe Pro Leu Ser Gin He 
340 345 350 

Leu Phe Val Ser Gly Glu Arg Leu He Val Asp Pro Ala Gly Glu Met 
355 360 365 

Ala Lys Val Gin Asp Phe Leu Gly Leu Lys Arg Val Val Thr Lys Lys 
370 375 380 

His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Pro 
385 390 395 400 

Glu Asp Ser Ser Ala Pro Arg Cys Leu Gly Lys Ser Lys Gly Arg Thr 
405 410 415 

His Pro Arg He Asp Pro Asp Val He His Arg Leu Arg Lys Phe Tyr 
420 425 430 

Lys Pro Phe Asn Leu Met Phe Tyr Gin Met Thr Gly Gin Asp Phe Gin 
435 440 445 

Trp Glu Gin Glu Glu Gly Asp Lys 



<210> 13 
<211> 284 
<212> PRT 

<213> Homo sapiens 
<220> 

<223> human NST-1 (aa 599 to 882) 



450 



455 
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<400> 13 

Lys Thr Cys Asp Arg Phe Pro Lys Leu Leu lie lie Gly Pro Gin Lys 
15 10 15 

Thr Gly Thr Thr Ala Leu Tyr Leu Phe Leu Gly Met His Pro Asp Leu 
20 25 30 

Ser Ser Asn Tyr Pro Ser Ser Glu Thr Phe Glu Glu lie Gin Phe Phe 
35 40 45 

Asn Gly His Asn Tyr His Lys Gly lie Asp Trp Tyr Met Glu Phe Phe 
50 55 60 

Pro lie Pro Ser Asn Thr Thr Ser Asp Phe Tyr Phe Glu Lys Ser Ala 
65 70 75 80 

Asn Tyr Phe Asp Ser Glu Val Ala Pro Arg Arg Ala Ala Ala Leu Leu 
85 90 95 

Pro Lys Ala Lys Val Leu Thr lie Leu lie Asn Pro Ala Asp Arg Ala 
100 105 110 

Tyr Ser Trp Tyr Gin His Gin Arg Ala His Asp Asp Pro Val Ala Leu 
115 120 125 

Lys Tyr Thr Phe His Glu Val lie Thr Ala Gly Ser Asp Ala Ser Ser 
130 135 140 

Lys Leu Arg Ala Leu Gin Asn Arg Cys Leu Val Pro Gly Trp Tyr Ala 
145 150 155 160 

Thr His lie Glu Arg Trp Leu Ser Ala Tyr His Ala Asn Gin lie Leu 
165 170 175 

Val Leu Asp Gly Lys Leu Leu Arg Thr Glu Pro Ala Lys Val Met Asp 
180 185 190 

Met Val Gin Lys Phe Leu Gly Val Thr Asn Thr lie Asp Tyr His Lys 
195 200 205 

Thr Leu Ala Phe Asp Pro Lys Lys Gly Phe Trp Cys Gin Leu Leu Glu 
210 215 220 

Gly Gly Lys Thr Lys Cys Leu Gly Lys Ser Lys Gly Arg Lys Tyr Pro 
225 230 235 240 

Glu Met Asp Leu Asp Ser Arg Ala Phe Leu Lys Asp Tyr Tyr Arg Asp 
245 250 255 

His Asn lie Glu Leu Ser Lys Leu Leu Tyr Lys Met Gly Gin Thr Leu 
260 265 270 

Pro Thr Trp Leu Arg Glu Asp Leu Gin Asn Thr Arg 
275 280 



<210> 14 

<211> 286 

<212> PRT 

<213> Homo sapiens 

<220> 

<223> human NST-2 (aa 598 to 883) 
<400> 14 

Lys Thr Cys Asp Arg Leu Pro Lys Phe Leu lie Val Gly Pro Gin Lys 
1 5 . 10 15 
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Thr Gly Thr Thr Ala He His Phe Phe Leu Ser Leu His Pro Ala Val 
20 25 30 

Thr Ser Ser Phe Pro Ser Pro Ser Thr Phe Glu Glu He Gin Phe Phe 
35 40 45 

Asn Ser Pro Asn Tyr His Lys Gly He Asp Trp Tyr Met Asp Phe Phe 
50 55 60 

Pro Val Pro Ser Asn Ala Ser Thr Asp Phe Leu Phe Glu Lys Ser Ala 
65 70 75 80 

Thr Tyr Phe Asp Ser Glu Val Val Pro Arg Arg Gly Ala Ala Leu Leu 
85 90 95 

Pro Arg Ala Lys He He Thr Val Leu Thr Asn Pro Ala Asp Arg Ala 
100 105 110 

Tyr Ser Trp Tyr Gin His Gin Arg Ala His Gly Asp Pro Val Ala Leu 
115 120 125 

Asn Tyr Thr Phe Tyr Gin Val He Ser Ala Ser Ser Gin Thr Pro Leu 
130 135 140 

Ala Leu Arg Ser Leu Gin Asn Arg Cys Leu Val Pro Gly Tyr Tyr Ser 
145 150 155 160 

Thr His Leu Gin Arg Trp Leu Thr Tyr Tyr Pro Ser Gly Gin Leu Leu 
165 170 175 

He Val Asp Gly Gin Glu Leu Arg Thr Asn Pro Ala Ala Ser Met Glu 
180 185 190 

Ser He Gin Lys Phe Leu Gly He Thr Pro Phe Leu Asn Tyr Thr Arg 
195 200 205 

Thr Leu Arg Phe Asp Asp Asp Lys Gly Phe Trp Cys Gin Gly Leu Glu 
210 215 220 

Gly Gly Lys Thr Arg Cys Leu Gly Arg Ser Lys Gly Arg Arg Tyr Pro 
225 230 235 240 

Asp Met Asp Thr Glu Ser Arg Leu Phe Leu Thr Asp Phe Phe Arg Asn 
245 250 255 

His Asn Leu Glu Leu Ser Lys Leu Leu Ser Arg Leu Gly Gin Pro Val 
260 265 270 

Pro Ser Tarp Leu Arg Glu Glu Leu Gin His Ser Ser Leu Gly 
275 280 285 



<210> 15 
<211> 291 
<212> PRT 

<213> Caenorhabditis elegans 
<220> 

<223> putative C. elegans 3-OST 
<400> 15 

Met Lys Tyr Arg Leu Leu Leu He 
1 5 

Cys Gly Val He Pro Asn Thr Ser 
20 

He Val Gly Val Lys Lys Ser Gly 



Leu His Leu He Asp Leu He Ser 
10 15 



Lys Lys Arg Phe Pro Asp Ala He 
25 30 

Thr Arg Ala Leu Leu Glu Phe Leu 
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35 40 45 

Arg Val Asn Pro Leu lie Lys Ala Pro Gly Pro Glu Val His Phe Phe 
50 55 60 

Asp Lys Asn Phe Asn Lys Gly Leu Glu Trp Tyr Arg Glu Gin Met Pro 
65 70 75 80 

Glu Thr Lys Phe Gly Glu Val Thr lie Glu Lys Ser Pro Ala Tyr Phe 
85 90 95 

His Ser Lys Met Ala Pro Glu Arg lie Lys Ser Leu Asn Pro Asn Thr 
100 105 110 

Lys He He He Val Val Arg Asp Pro Val Thr Arg Ala He Ser Asp 
115 120 125 

Tyr Thr Gin Ser Ser Ser Lys Arg Lys Arg Val Gly Leu Met Pro Ser 
130 135 140 

Phe Glu Thr Met Ala Val Gly Asn Cys Ala Asn Trp Leu Arg Thr Asn 
145 150 155 160 

Cys Thr Thr Lys Thr Arg Gly Val Asn Ala Gly Trp Gly Ala He Arg 
165 170 175 

He Gly Val Tyr His Lys His Met Lys Arg Trp Leu Asp His Phe Pro 
180 185 190 

He Glu Asn He His He Val Asp Gly Glu Lys Leu He Ser Asn Pro 
195 200 205 

Ala Asp Glu He Ser Ala Thr Glu Lys Phe Leu Gly Leu Lys Pro Val 
210 215 220 

Ala Lys Pro Glu Lys Phe Gly Val Asp Pro He Lys Lys Phe Pro Cys 
225 230 235 240 

He Lys Asn Glu Asp Gly Lys Leu His Cys Leu Gly Lys Thr Lys Gly 
245 250 255 

Arg His His Pro Asp Val Glu Pro Ser Val Leu Lys Thr Leu Arg Glu 
260 265 270 

Phe Tyr Gly Pro Glu Asn Lys Lys Phe Tyr Gin Met He Asn His Trp 
275 280 285 

Phe Asp Trp 
290 

<210> 16 

<211> 4045 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> 3-OST-4 5* promoter/exon 
<400> 16 

gaattctgtg ggtgttggca ggggagacag aaaactatct tccatcgagt cttcggatcc 60 
attgggaatg cctggatgac gtcagagttc gccctgtgta ggtagctccc acttttcatt 120 
gtaggtttct caaggacttg ctcctagaaa aagcgtggct caaaagtaga taaaaaatag 180 
gcaactgcct aagtgtgaaa tttacaaagt tcctctccaa aaaagcccgc ctcctcccta 240 
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tcacttgtgg 


gcctgacatt 


ttaccaaagg 


ggctctattc 


tttcaagagt 
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^ O w 
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2160 
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ccaagggagg 


ggagtaaacg 


aggtggggtg 


2220 


gggaacaccc 


caagtgcgtg 


cgtgctgggg 


ggctgggggg 


cacgatctcc 


gttctcccgg 


2280 



BNSOOCID: <WO_992200SA2LI_> 



wo 99/22005 ' PCT/US98/22597 

25/25 





a rr /^O' a 
^wL«ciyvM>y^A^ 


y ^ w V— V— • ^y ^ L- v-» 


v.*- \w* V-o ^ ^ v.* 


^ y ^"•^•■y y ^y 


^y ^y ^y dyy ^ 


2 34 0 




LrCCC Uwy y w.y 


y ^y ^--y y y ^y 


v^y ^y ^y y ^ 


w ^ ^ V,. l.^ ^ ^ 


w ww^^ \«> ^y >»• 


2400 




wwy y Vi^n 


a a. a.cr t a cf c 


»y *i*yy y y "^y 


craa c ti c tier Qcr 


c tCTcaacracTC 


2460 


gcg cggcggc 


y g ^y y v-r «y a.y 


y ^ *-y s^c^y w^y 


ddy w» v»y ^y y w 


yy**y^^yyyy 


a arr caaaaa c 
««y *^yy yyy 


2520 


gcugcagacg 


gagcaggi.gc 


cgccggcggg 


wy ^y ^y (>'^ 


^w>^x»w«^yy 


wy ^y 


2580 


aggccgaggg 


yyyyycyyt.y 


y t-gggggggc 


v>«diM>^^yydv^L> 


cggugggcdg 


eg t-ggggcgg 


2640 




yy y y y ^ 


v« v> v» w« y >j 


cao'co'crcra ca 


y u.yy i^v^dyyy 


c CGrcrcro'a'CO c 

w v.y y y y y v*y 


2700 




^ L» L>' ii>y ^ wy 


*-y y y y v-y y V* 


*-yyy *-**y *-yy 


e'er CI c crcr coo^ c 
^y y *-y y *-y y 


croccrCT ccro CO 
yy ^yy ^yy ^y 


2760 


gcggcggggg 


cy g t-y y uy « 


cm w o ci ^y ^ w ^ 


y y y wdy v«y 


yyyyy*-'*-y '-'^ 


y ^ ^y ^ ^y ^ ^y 


2820 


ccgccgwcgc 


gagccgggag 


w> wy ^«y d Liyy w 


v-^yy uyy 


fToa /^/""t" r*ot* 


v«.y v» L> w> wy w 


2880 


4* ^t^P^V 4* ^4 ^1 n 

cccgccccca 


ccucuggccg 


^y u>u>y u> w«y 


y wc>^M>yy ^y 


^y w i«ddy y 


yyeug wt-yy 




gcgcaa.gc^g 


A. X. ^ ^ ^ 4* i-r^ 


gcd^t^ w wy u>w 


^ \^ L>y ^ ^y ^ ^ 


dwt^L.dw^i«y W 


y w U.d^dy v-w I- 


J w w W 


cctgggcggc 


^ ^^^^^r 

u-Cy y gct-ccc 




4- 1^ ^ fTf^ f^fT ^ fT 


ody y dy i. ^y 


^yyy^y^^y ^ 


J v/o w 


o ^ 

cgcwyoy ccc 


0— y V— v_-y ciy ^ 






cl ^ v_« w ^ o y 


i^y^y ^^yy 


J X ^ V 


v-.y v». t-, L v^y 


(^ciy^w.y^^^y 


^y ^^y ^ ^y ^ ^ 


crct'ocra r'aa <^ 
y w ^y y 


y ^y "y ^ vi*^* »^y 


^ y y ^*y ^ ^ y *^ ~ 


3180 


t^fv Si ^ ^ a 

i-.y ciy vj o ^ d 


y a.y ody w^ciy 


V— wy ^ s-» o V— y y 




'^yyyyy ^^y 


^-y *=*y y y ^y y 


3240 


cyy ay y eg cc 


cgggacy cc u 


y y ^ L. u.^.»y y 


oi^\^y\^i.^yy^v^ 


w^^i^dy y dy d 


u-ydc^d*«,yyo 


33 00 


t.cagagcgcg 


ctgccggaga 


yyy^^y 


yydy i^o^sjdy c 


d C> d c oy d v^y 


dy y d L. e L. ey 


J J> O VJ 


aggc cggaga 


gcggccaacg 


yy cLy ^M>ciy t-y ct 


gaggggcggc 


y^x^y Li^dy wd 


w ^ ^— \-*y d^ L>d 


342 0 


^ygyyagacig 


aciy c L.y ^(^Oi v« 


ayy wy v> wwci i. 


^d L.(^y yy y 


a a rra a a act Si cj 
ddy dddy y dy 


y y d^^ ^y ^y w 


3480 


y c t.y ^y y «-y 


y ^y ci L> wy ^y 




*-y *-y^yyy *-y 


y ^yy y *-y ^<*-y 


acrccacactti 


3 540 


v«> w ii«v»y A^i^eiy y 




**yy yy ^ ^^y y ^ 


rjtcrcf t acacrcr 


ti acr cr a c c c t cr 


crcr c t: c ccr ccrcr 
y y ^ ^y ^y y 


3600 


gctgguggag 


acgcyt.yyyy 


y av^^4 ^y y 


crcr cicia acf c CGT 
yy yy A"y *«"^y 


ecTCTCtttcca 

V^^*f W %m Lp* w w V^^« 


ccr c2 c c t fc coa 


3660 


gcauccaggc 


a /^^rrt* ^ ^/^^Ta 

accgucwcga 


y cty y ^ ^ cidy 


cccwwy v>y dy 


y y i.*^ uy i^dd 


oi\^K*\^ i,«y y ^y y 




cguugcucag 


y y y y cggc 


wgagagggct. 


yy dv« itfwwdy w. 


a a a/^rrt* r^a o 
y dddy y L.^dU* 


^ f- i- a t* 1- ^ r*arr 
u> w> u.d i» iw wwdy 


•3 / O w 


ggcgagggga 


ggaggtgtca 


^^^^^ ^r^^^^^^r 

ccc^gcwc ug 


cc ucccgcgc 


^ ^r^t" r^a ^ ^ /~» a 
^CC WCd wWCd 


dggdggugce 


J 04 LI 


gtctgaatct 


gcccagctcc 


aagcctggga 


acccccagcc 


ctcctgcctg 


ctgggtgttt 


3900 


ccgaaaccag 


gctcttgcgg 


ggttctggga 


ttctgggcag 


aggactttga 


ggagtgagac 


3960 


aggatggcta 


aattgactaa 


ggggatttga 


ggtcccctgg 


aatctcttaa 


aatcaccctc 


4020 


aaacgcatitt 


gcgtggctgg 


aattc 








4045 



BNSCXXIO: <WO_9922005A2J.> 



THfS PAGE BLAMK (uspto) 



* 



wo 99)23005 FCr/U59a/32597 



1/3 



m2' 


-OST- 


1 


MTLLljlXSAVli 


LVAQPQLiVHS 


HPAAP6PGLK 


QQELI/RK\H 


40 








f 1 1 1 M 1 1 


i 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 


\ 1 1 


1 I 11 1 N 

M M 1 11 




h3- 


-OST- 


a 


MAAljLIiC^&VL 


r,VAQPQi:.vps 


RPA ELG 


QQSLLRKAGT 


36 


rtu- 


-OST- 


1 


LP&DT5BGTA 


SHGSTQQLPQ 




TRAIjLEMLSL 


80 








] 1 1 [ 


III 11 II 1 

Ml Mill 


1 1 1 1 M 1 1 [ 1 

1 II 1 L 1 1 1 L 1 


1 1 1 1 [ 1 1 ( 1 ! 

1 1 1 1 L M M i 




h3' 


-oerr- 


1 


LQDDVKDGVA 


FNGSAQQLPQ 


TIIIGVRKGO 




76 


ma 


-OflT- 


a 




•VHFPDMEEHy 


SQGL&HYLTQ 


MPFSSPHQLT 


120 








iiiiiiiiii 


IIKKJIK 


[ llllll 1 

I 1 M M 1 1 


1 1 [ 1 1 1! 1 i 

MM M 4 M 




h3 


-OST- 


1 






SHGLGWYLSQ 


MPPSWPHOLT 


116 




-OST- 


■1 


VBKTPAYFTS 


PRVFERIH5N 


NPTIRLLLIL 


RDPS&RVI.SD 


160 








MilKllli 


llllll It 


] 1 1 1 1 1 1 1 1 


1 II II [ 1 1 M 

L M M M 1 M 




M 


-OST- 


1 


VEKT^AYSre 




17PSXRLLLIL 


RDPSERVIjSD 


156 


m3 


-OST- 


1 


YTQVLVMHLQ KHKPYPPIED 


LtiMRDGRUKtl. 


DYKAL«RSIiY 


20O 








iiu ii) 1 


llilll II 


1 1 1 1 11 1 

1 Mill! 


I II ( 1 ] 1 1 1 1 

II 1 I 1 1 1 Ml 




h3 


-OST- 


J, 


YTQVPYHHMQ 


KHKPyPSIEE 


FLVRDGRIiNV 


DYKAtiNRSIiY 


196 


za2 


-OST-1 


HAHMLNWXiRF 


FPLGHIHIVD 


GPRLIRDPPP 


EIQKVERFLK 


240 








1 11 llllf 


IK iiiiii 


1 1 1! 1 1 [ 1 II 


1 1 M 1 M 1 1 1 

Ml 1 L M 1 11 




h3 


-OST- 


'1 


HVHMQKIMIjRP 


FPUIHIHIVD 


GDRIiXRDPFP 


EIQKVSRFLK 


236 


nG 


-OST- 


•1 




YPMKTKiSPYC 


LRDSGKDRCL 


HBSKGRAHPQ 


280 








iniiMi II 


11 |[|[lll[ 


IIIM INI 


1 Ml Ifllll 




h3 


-OST- 


■1 


L5FQINASNF 


YPMKTKGPYC 


LRSSGRDRCL 


HESKiSRAHPQ 


Z7e 


w3 


-OST- 


•1 


VDPKLU3KLH 


EYFHEPnKRF 


PKLVGRTFDW 


H 


311 








mill ill 


lillllllll 


1 liMllli 


1 




^2 


-OST- 


•1 


VDPKLLNKLH 


SYFHEFNKKF 


FBLVGRTFDW 


H 


307 



Figure 1 



BNSDOCID: <WO_9922005A2TI_> 



SUBSTITUTE SHEET (RULE 26) 



wo 99tt20D5 PCTirL>S9a/22S*J7 



2/3 



T-I<vj E-l i>J 

I I to I n < I 

H &4 O ^ I C-i 

I 5-5 W tX) 

rj rr, in o w o o 

QJ I o I I 



ir-i ^ rt^ rvj -I 

1 I CO I n I I 

H E-« O E-« I t4 
to to I V) 

!s; n1 o ?n O O 

01 I O I I 

U CO I m CO 

cn 



v-1 cv! E^-i vi* r^J t-» 

1 1 tF> I CTV I I 

H L-i O M I M H 

to to I to [.-4 to to 

& &^ O cn O O 

4J I O I I 
0 fo I fo 



CO m cn 

eu M 
M 

W H > 
CO CJ 
Fs p 



E-i H 




M * 



M M M J 

8" ??S^w 



Q a » PL4 

^ ^ Ef: 

1-1 H H CTj 

^>£^ p:^ 




^ ^ ^ « 

u O d tr> 




j-^ iH > 
» H H 
M U 
Cu Cu U 
t-i E-i ' 
U CO 
LO Oi 

to to 
to ?0 



u kd tss M 

fl; Pu > 

» N H H 
l> S E^ 





O ^ 0 O £D 

M S > M M > Kl 

W W M H W K M 

E-1 S HQ m 

>ci (4 {-I IS u u 



M M > 



E^ H 

CKl 



& V) p 

B M 
&1 H El W 

P« P4 P4 

W hi H >^ 

H Q Q g 




^ >^ >^ >1 >^ M 

S g O Q D Q Q 

CQ 00 V3 M u u ta 

>1 >1 W h* Kl M ^ 

^ ^ S pS Pj 

O E-i Fi E-i E-i U 



I 




g M 
^ & Ei« 

^ ^ Si 



Ci U ID tn 
CD U Q n 

u u u u 

EE"" 



04 04 Cl 

Ul J O 
10 ^ 
13 VI Q 

E-! 




U U 

Dj Qj ; 




(V4 Oj Ol 

£d £k1 IH H H H H 

^ £^ t^- ^ ^ ^ ^ 

OCM □ Q U □ 

Cfi CD U eg (9. tS- Q 




H 1^ H ^ ^ 

a ^l w >i X I 

" " ' 

""a 

5 o 15 u O E3 tn 

Pi Pi M S-r « W 

E E 05 E 

fi A A ^ < 

CO to z 




Ej - . - 
rc< to a 



i-i DM E-i ^ 
I I CO I 
E-i O E-t 
CO 10 I ^ 

fo o 

OJ I 



E-i CO 
b3 Ca3 

t3d 



I I 

CO 

o o 

3 I 

00 ro 




^ S " pS 

_ a 
w r 




*n CO <•> d CO o ^ 

l»i P^ C>J ^ r^ 



m r* w iJ^ rH 

<Ti ID iH P' 
\0 CM tS rH r-r 



<rk CM 60 ro m 00 

^ ^ M 00 ^ CO 
h- 1^ iH n <M CM «-l 



U> <ri f** cs| <ai 
CM CM ^ O ^ (M V£> 
<l> fiO W OO m CM 



BNSOOCID: <WO_9922005A2TL> 



SUSSmUTE SHEET <flULE 2B) 



wo 99/ZI0Q5 



PCT/US9ai2259? 




wo 9m2m5 PCT/ussttWiSW 

if25 

Lit?, alan 

FRiTSEr Linda S. 
SCHWARTZ, John J 
Z«Ai]G. Lijusn 

MaaeachusettlJ Indbitut© of Tfechnology 

<X2D> HEPARAIS SULFATE D-GLUCOSAMH^^O- 3-0-SU]UTOTRMTgp»RASM , 
AND USES TI<2EREF0R 

<13 0> MIT-Oe7PC 

<I4D> 

^;15j:> 2957-10-31 

<151> 19S7-1Q-34 
<aGO> IG 

<:no> P^tentln Ver. 2.0 

<210> 1 

<211> 4.6&5 

<212> I»7A 

c2X3'> Mus muaculuB 

<220> 
<221> CDS 

<222> {323) . . (1255> 
<223> rrou«^ i-OST-l 

<4D0> 1 

tgcattgeaa tgtgraagtgt tccbgaataa acctgcttga ASaaggacaa ogtggtgttg 60 

cgtetttCCt gctggtoggg gtggaataga <?MCti3Ccct ttttaacttg ggtgACCtca 12D 

tgARcataaa agaacttaas ggiragcAlljc dAtggacbta aagtaggetg dCCtbsswict ISO 

c»gragatctt ettggfCdatg tctciggaga ttaaagtaat tggc^actgg Agatactcat 24D 

gttccagta^ fccaagaggga gccttgctgc tacttcatga fcccaggicgcs tgtggcoeag 3O0 
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eta geg atg c?tc iigc ctg cat gcc gat ^tt get go<i ©cb s^^a aac* gag 552 
Lsu Glu Met lifiu &er Leu Kie tTo Asp Val Ala ALa Ala GXu Aan Glu 
7S 8D B5 9a 

gtc cat ttc ttt gac tgg gag gag cat tac age caa ggc ctg ggc tgg 64D 
•Val Hlrfs: Ph© Plie ABp Trp Glu Olu Kia Tyx Ser ^Jln Gly ieu Qly Tirp 

95 aOO 105 

tac etc acc CJag atg ccc tta tee tec cct cac ea& etc acc gtg ga& 686 
Tyv L©w thr Gin Wat Pro Ph© fier Ser Pro His <31n Leu 'n^x Val <31u 
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fiS 70 75 80 

Alft Ala Ala Crlu Aen Glu Val Kla Ph.'^ Pha Aap Trp Olu Glu. Tyr 
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ser Hia oly Low Gly Trp I'yr Leu Sex Oln ^fet Pro the Aar Trp |Pri> 



BNSDOCID: <WO_9922005A2TI_> 



6/25 

100 ids iio 

His Qln Leu TiXt Vol Giu Lys Thr ^ro Ala Tyr Phe Thr aer Pro lyB 
115 120 135 

Veil I^ro <31u Arg val Tyr Ser Bflet Ae;» Pro fler lie Arg L^u Lew Leu 
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145 150 L55 160 

Ph^ Tyr Agh Kis Mot Gin Lya HiB Lys Pro Tyr Fro Bnz lie Clu Qlg 

x$5 170 ns 

Ptie Leu Val Arg Asp Gly Arg Leu Asn Val Asp Tyr Lys Ala Leu Asn 

18D leg. 190 

Arg Ser Lea Tyr VaX 13 ie M<:^b Qln Aan frp Leu Arg Fh(> Phe Pro 
195 200 ^as 

Leu Arg His lie His lie Val Aep Gly Asp Arg Leu He Arg Aap ^ro 
210 215 220 

Phe Pro Qlu He Gin Lya Val Glu J&rg Flie Leu Lys IiBU Ser Pro QJa 
325 3^0 23S 240 

He iLsn Ala &dr Asn Phe Tyr Phe Aan Lys Thr Lytss Qly Phe Tyr Cy^ 
245 250 25& 

Lbu Arg Asp £er Gly Argr Mp J^2^ Cya Leu Kie Glu Ser Lyt Qly Arg 

265 270 

AlsL Kis Pro Glsi Val Asp Pro hy& I^u ]Ucu Aan Lye Leu Hi a Qlvi Tyr 
27& 2SD 2eS 

Phe Ria Glu Pro Asn Lya Lya Phe Phe 9lu L«u Val Gly Arg Thr F]\6 
250 2BS 2QQ 
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<:21L> 1951 

<212> DNA 

c213;> Homo aapieuB 

<220> 

<221> C339 

<222> (73) . . [1173} 

<225> humam 5-0ST-2 

<<00> S 

cgcagggccr^^ cagcagccoa gocscoggtg ccccoLogga Aaccabgacc ccoggogcgg «0 

gcccAt9>3aig cc atg gcc tat agg gto cstg ggc? Cg<5 gcg ggg cca cot CAJ 111 
Met A2.a Tyr Arg Val Leu Gly Arg Ala Gly Pro Pro Gin 
15 10 

<ycg egg agg gcg cgc agg ctg etc tbc goc ttc acg otc teg etc tec 159 
Pro Arg Arg Ala Arg Arg Leu Leu Phe Ala Phe Tbr Leu fier Leu &er 
15 20 25 

tgc act tac! ctg tgt tao age tte ctg tgc tgc tgc ga<3 gac etg ggt 207 
C?ye Thr ryr Leu cya Tyr Ser E^he Leu Cys Cya Cya Asp Aep Leu Gly 
30 ^5 40 45 

csgrg egc! cgc etc etc ggc gcg cet ogfcs t^y« etc cgc ggc ccc age 9cg 255 
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Arg Ser Arg leu Leu Qly AI« Pxo Axg Cys Lfeu Arg CJXy Pro Ser Ala 
50 55 60 

ggfe ^gt2 CA^ aas ctt etc cag $ag tCCJ cgc ccc tgt gat cc^C tec ggg 303 
Gly <31y Gin Lye leu leu ^In l>yB Ser Aing Pro Cya Aep Pro Ser o5y 
65 70 75 

ccg acg ccc age gag ?^g^ get ccc sgc gcg coc gcc gee gee gtg ^51 

Pro Thr Pro B^t Olu Pro Ser Ala Pro 9er ALa Pro Ala Ala Al4l V*l 
00 35 ^0 

ccc gcc cct cgc etc tciC tec asc cac ticc ggc? tcci ccc aag ctg 

Pro Ala Pro Arg HiBU Ser Gly ^er Acn Hls Sar Gly ser P^fo Lya Leu 
55 lOD 105 

ggt acc aag ttg ccc caa gac etc €^tt gtg ggc gtg aag aag gg^g 447 

Gly Thr Lya Arg Ij^u Pro <31n Ala leu lie val Qly Val Lya Lya Gly 
110 lis 12D 125 

§gc acc egg gcc gbg cbg gag ttt ate cga gta Cac ccg gac gtg egg 455 
Oly Thr Arg Ala Val Lieu Vhe lie Arg Val Hla Pro Asp Val Arg 

130 125 140 

gcc ttg ggo acg gaa coc cac ttc ttt gao a$g aac tac ggc cgc= &43 
Ala Lew <31y Thx Glu Pro Hie: Pho E>he Asp Arg Aen Tv>r Oly Arg Cly 
14S 150 1S5 

ctg gat tgg toe agg age ctg atg c<qc acc etc gag agg C2A9 ate 591 

leu Aap Trp Tyr Arg Set Le\i Met Pro Arg ThK Leu Glu fier Gin lie 
IffO 16& no 

acg ctg gag aag acg CCC? «tgc tac ttt gtc act c^a gag get cct ega 63 p 
Thx Leu Olu lya Thr Pro fi^r Vyt Pho Val 'Il^r Gin Olu AlA Pro Arg 
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oga ate ttc aac atg tec cga gae acc ctg ate gtg gfet gtg csgg 607 

Azrg lie Phe Aen Met aer Arg Aap Thr Lys Leu lie VaI Val Val Arg 
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Asa Pro Val Thr Ars Ala lie Ser Aap TVr Thr Oln Thr Lbu Ser Lys 
210 21s 220 

aag ccc gac ate ccg acc^ ttt gag ggc etc tee ttc <5!gC aac cgc acc 7H3 
Lya P36X> Asp lie Pro Thr Phe Olu OLy Lsu 6er Phe Arg Aen Arg Thr 
22s 23r> 235 

et9 cbg gtg gae gtg tcg tgg aac gee ate cgc ate ff^C atg tac H31 

Lbu Gly Leu Val Asp Val Ser Trp A?n Ala He Arg lie Gly Met Tyr 
240 245 250 

gtg ctg cac ctg ga? age fcgg ctg cag tao ttc <fOg eta get cag att 075 
Val Leu His Lsu Glu Ser Trp l^u Gin Tyr Phe Pro Leu Al^ <Jln lie 
255 260 26S 

cac ttc gto a^t ggc gag cga etc ate set 0ac cog gee ggc gag etg 927 
Hie Ph^ Val ser oly glu Argr Leu lie Thr Aap Pro Ala Oly Glu Met 
2*T0 S75 200 2&5 

ggg c^a gtc cag gac tte ctg ggc att aag aga ttc ate acg jac aag 37 5 
eiy Arg Val <Jln Asp E>he leu aly He Lye Arg Phe lie Thr Aep Lyj9 
290 29S ^00 

cac ttc tat ttc aac aag acc aaa gga tte c^ct tgc ttg aaa aaa eca 102S 
Hie Phe Tyr Phe Aesn Lys Thr Lye Oly Phe Pro Cyc Leu Lysi Lys Thr 
305 310 315 

gaa teg age <?tc ctg cct cga tgc ttg a&e aaa tea aaa ggg ag^ act 1071 
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Glu Sex Ser Ijeu Leu Pro Arg Cys Lau <3ly iyB Ser Lys <31y Apg Thr 
320 325 33D 

cat gta cag a&ti gat: cct gas gt.g ^ac cag etc cga gaa ttt tat 

His Qln lie Acep Ero Glu val xle Asp Gin Leu Arg Glu Fhe Tyr 
^35 340 345 

aga acq tal: «at ate ttt tat gaa ace gt^ cag gao t^c agg US'? 

Arg Pro Tyr Aan lie Lys Phe Tyr 01>X Thr Val Gly Gin Asp Phe Axg 
250 3&5 3^0 365 

tgg gaa t3«^<JCd*cg aaaggaaagg gctctcaagg gcftcttctgc tcatctctte 1223 



Tip Glu 














^ ^ _ ^ ^ 

cgt:gagact.r 


gctccca^^c 


cotct^tct: 


CGC&ccaaca 


ftacFcctggct 


ccagccccct 


1263 


tt! cccaacitt 


g^g&tgcatc 


atcttggaac> 


GaggaagcfCic^ 


^gctaaagcc 


aagagaoaag 


1343 


sgag 1: C!t9 


CCfa.ct'agtt-t:^ 


t-cateag^GC- 


gt.tcaBgc!M 


cgabctg 


cbcct.ggaac 


14Q3 


gtccagtaaa 


ttccagaatc 


attetccFttt 


ctgciccabaa 


agggccrtgg 


agaa&t^ctt 


1463 


tsiagaagagt 


gaatgttcca 


atgatgatag 


Atattataag 


cgaogatgg^ 


tctgttgcta 


1523 


tgaacacagc 




tgtCiAttgtc 


cacccaggag 


tggc= cjttgtt 


A^tbccaagt 


15B3 


ggcatgtatc 


ttCC!Ctc=t5a 


gcttcatttc 


t^t:caagat.gc 


tctg^^tggt 


gggat^ggae 


1G43 


a^jeatCCtCA 


gccctcDtca 


gaccL^t^ca 


attcatt^ACi 


a^atbgcaaa 


gctgaaagcf& 


1703 




ctcDtgggag 


acagaccctt 


tggCgatgaa 


ataaaccagt. 


gactteagag 


17€3 


cctatggtct 


DaactLgtgct 


tgaaeiaacac 


tQtctctgaa 


aacaactti^g 


t^$itfcctccc 


1323 


tgctccetgt 




acabaabtcb 


gcrgrtacgg 


gtac?tttgct 


catacgagct 


1SB3 


tt<?atgttc* 


gcatgcaatg 


gaa&Gat:gct 


tgtacatgtg 


Mabaaabat 


g^cbcbcbcg 


1543 


tgt:c!Ctt<i 














<210> € 
<211> 367 
<:212> PRT 
<213> Homo 


sapiens 













<4D0> S 

Mob Ala Tyx Arg Val lieu CJly Ar^ Ala Gly Pro Pro Oln Pro Axg Arg 
15 10 15 

AXa Ars Arg Leu Leu Ph& Ala Fhe Thr Lbu 6er Leu Ser Cy« Thr Tyr 
20 2S 30 

Leu Cys Tyx Ser Fhe Lifiu Cyoc Cyi9 Cy» Asp Asp Leu Qly Aarg 9er Axg 
35 «0 45 

Leu li&iA Oly Ala Pro Axg Cys Leu Arg Oly Pro 6er Ala Gly oly Q^ti 
50 &5 «0 

Lya Leu leu Gin Lys Ser Arg Pro Cya Asp Pro Ser Gly 5ro Thr Pro 
€S 70 75 90 

Sor Glu Pro S^r Ala Pro aer Ala Fro Ala Ala A]L<i Vlil Pro Ala Pro 
BS 90 9^ 

Leu Ser GLy £er Asn Hl^ Ser ^ly Sax Fro Lys Leu Qly Thr Lys 
100 105 Lio 
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Arg Lifiu Pro <3in ALsi Lau lie Val Gly Val Lys Gly Gly Thr Ars 

115 120 

Ala Val L©u Glu Phe IIa V^l His Pro Aap Val Ar^ Aia Lbu Oly 

150 135 140 

Thr <3lu Pro His Phe Phe Asp Jiarg A»n T>'r Gly At^ Gly Utu Aap Trp 
145 ISO 155 160 

Tyr Arg aer Leu Met Pro Arg Thcr Leu Glu S&t Gin lie Thr J^m Glu 
iei5 170 175 

Lys Thj: Pro S^r Tyar Phe Val thr Gin Glu JklJi Pxo Arg Arg lie Phe 
IHO 185 190 

Aan Met Ser Arg A&p Thr Dys Law lie Val Val VaL Arg Asn Pro Vai 
195 200 305 

Thr Arg Ala IIb Ser ABp Tyr Thr <?axi 'Ihr Iifiu Ser Lys Lys Pro Asp 
210 215 22D 

Ila Pro T)iar Phe Glu Gly Leu Ser Ph^ Ar^ Asn Arg Thr L.ftu <31y ieu 
225 230 235 240 

Val ABp val ser Txp Asn Ala He Arg lie Qly M©t Tyr Val Ley His 
245 2S0 255 

Leu Glu ser Trp Leu Gin Tyr £^h© Pro Leu Ala Gin He His Phe VAl 

2^0 2€S 270 

fler Oly Glu Arg ieu 11$ Thr Asp Pro Ala Gly Glu r^Si&t Glv Arg Val 
275 2B0 2B5 

Girt Asp PhB LBU Gly He Lye Arg Phe lie Thr Aap Lyer His Phe Tyr 
2S0 2S5 iao 

Phe Afin LQTft rhr iya Oly Phe Pro Cy*: L^u Lya Lys T*hr Glu aer fler 
30S 310 2L5 MO 

Lbu Leu Pro Ars Cys Leu Gly Lys Ser Lye Qly Axg Thz HiB Val Gift 
325 33D i35 

lie Asp Pro Glu Val 3l9 Asp Gin leu Arg Qlu Phe Tyr Arg Pro TVr 
340 345 ^50 

A^n lie Lya Phe Tyr Gly Thr Val C51y Gin Asp Phe Axg Trp Glu 
355 

<2l0> 7 
-s:211> 2314 
<212> DNA 

<213> Homo BBpiene 

<220> 
<22PL> CD8 

<£22> (79P) , . {2016> 
<223> human 3-Osr-3A 

<400> 7 

CJ^gc^gcggc acsaggaggca gcc^gtgasc gcctgcgage agagtggogg gsgcagct$<i €0 
caggtcccga gcagcccagc ccagcccagc cacgeggctc acaggtgggg tccAsigagca 120 
gttbggagca accog^o^ct acggagaggg gtggacggct ctgcacgggc ot^ct^cto IBfl 
ccgctcgggc a^gggactc ggggggacct egctectfcgg ccgagagaao cft&aactcgg 240 
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cgcgcccagg 


cgggcaag§g 


gac cagagaa 




ggaagtca.ct 


30D 


5t:D9ctcjgcc 


actg^ctugga 


gcgcacggag 


cgcagaggc^ 


cggcagccgc 


gog'tgccet.G 


3$P 




agccagt.^at 


•^cagga cgc 




bccgD^Gcga 


ga.agb ctc&c 


420 




Gt.gaga.09ca 






agDi:ccGcga.t 


tccggcgiMCit- 




ct t'go^gigga 


accgagg-^^^ 


c*a&g^c t^cc 




gactbgggD^ 


agucbaagac 


ET Jl A 

540 




iLBagca c^^ga 


cefc^C!^^ tec 


cct b-cccgcc 


GGCt ogactg 


gagg cag^g^ 


jv 

600 


tocjtgcagagg 




att-ccgt&tc 


cccgcggagc 


ceoggecgct 






acagttc^ca 


cggccacagg 


ggcgcscggc 


gatgtggcct 


dcgtccagog 


cgGrggcccg 


730 


ccggggggat 


gcrctggcac 


etgtc^ggfSt 


ccflg^^cc&ag 


catggGcggc 


gGgttgccc!9 


790 




cggctagg at^ 9^c:c dct ccg ggc ccg 
Mat Ala Pro Pro QXy Pro 
1 5 


goo agt gcc CStCF tccs 
Ala Ser Ala Iseu Ser 
10 


831 



acc teg gee gug <:?cS5 cFbg tec ogc agG atG tto 
Thr fiar Ala Glu Pro Lau Sar Arg der II o Phja 



egg a&er ttc ttg ctg 
Arg Lya Phe lieu Leu 
25 



^79 



atg <?tc tgc tec ctg etc acg tec ctt liM ^tc 
MeL Lau eye Sec Lou LittoA Thr £er Lau Tyr val 
30 35 



ttc tac tgc ctg gcc 
Phe Tyr Cyjg Leu Ala 

40 



927 



gag cgc tgc gag acc ctg tec ggc ccc gtc gtg 
Qlu Arg Cya Gin Thr IiBU sar Gly Pro Val VdL 
45 &0 



ggg Gt.g tec ggc5 ggc: 
Oly Lqu Bar (aly Gly 
55 



S75 



ggc gag gag gcg ggg gcc cct ggt ggc ggo gtc< 
Gly Glu Glu Ala Gly Ala Prci axy <31y Oly Val 
€0 6S 70 



^tg gc9 gga ggo ccg 
Leu Ala Gly Gly Fxo 
75 



102 3 



agg gag ctg gog gtg tgg ccg gcg gcg gca cag 
Jkrg Glu L«u Aldi. Val 7rp E'zro Ala Ala Ala Gin 
BO 85 



aga aag cgc etc ct;g 
Airg Lye Arg L«u L>«u 

90 



1071 



caa otg ccg <:«lg tgg ojg agg cgc egg ccg ccc 
ein Leu Pro Gin Trp Arg Arg Arg Arg P?CO Pro 
&S 100 



gog gcc; cgc^ gac gAc 
ALa Fro Arg Asp Asp 
105 



ggc gag gag gcg gcc tgg gaa gaa gag tCC CCb 
Qly Olu Glu Ala Ala Trp Glu Glu Glu fier Pro 
ILQ 115 



ggc ctg tea ggg ggt 
Oiy Lieu Ser Gly Gly 
120 



1167 



ccg ggc ggc tec ggg gcc ggA «tgc a^c gtg gcc 
Pjto Oly Gly Ser Gly Ala Gly Ser T'hr Val AX^ 
125 130 



gag gcc cog oeg ggg 
Qlvi Ala P^ro 3&ro Gly 
135 



1215 



a DC ctg 0csg ctg etc ctg gac gaa ggc age aag 
Thr Xjeu Ala Leu Leu Leu A3P Olu Gly der Lys 
140 145 l&O 



C!&9 ctg CFcig ^ag gcc 
Glsi Leu Pro Gin Ala 
155 



1363 



afcc ate ate gga gtg aag aag ggc ggc acg egg 
iXe lie 21a Gly Val Lya Lye Gly Gly Hhr- Arg 
160 165 



gcg ctg ctg gag ttc 
Al« IiCu Lou <31u Ph9 
170 



1311 



ccg cgc gtg cac cce gac gtg cgc gcc gtg ggc 
Leu Axg Val Hia Pro Aap Val Arg AX^ VAl Oly 
175 lao 



gee? gag ccc? Cftc tte 
Ala Glu Fro Hi a Fhe 
1«5 



135$ 



ttc 9ae egc age tae gac aag ggc etc gcc tgg 
Phe Asp Arg @«r Tyr Aap LyiGi Gly Leu Ala Trp 



tAC Cgg getc ctg 
Tyr Axg Aap Leu t-Sst 



1407 
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ISO Ids 200 

Dcc aga ace ctg gac ggg cag ate ace atg gag a«g a<?g ccc agt tac 14 55 
Pro Ar9 Thr Leu Asp Gly Gin tie Thr Met Glu Lys fThr Pro Sbr Tyr 
205 2X0 2i5 

tte gtc! ac*g egg gag gee ectf gcg cgc ate tcsg gcc atg toe; aag gac 1505 
f'ha V«il *rhr Arg Calu Ala Fro Ala Arg Xle Sar Tk^^l K«.b ^er Lys Aap 
270 235 230 235 

*dcr aag etc cite gtg gtg gEtg egg s^u ccg gtg aoc agg gcc «tc teg I&Sl 
TJruT Lya IiQU lie Val Val Val J^yg A£p Pro v^l Thr Axg Ala lie Ser 
340 245 2S0 

gac teic ac5 cag acg dtg tec aag egg ccc gae ate ccc ace tfcc gag 1559 
Aep Tyr Thr gin Tbr ieu S^r Lye Axg Pro Aap tie Pro Tlir Pli* alu 
255 260 265 

age ttg acg etc aac egg aoA gcg SSfC etc abc gac acg teg l^gg 164 7 

'.^er Leu 3hr Phe Iiya Aan Arg Thr Ala Gly liet) ZIb Asp Ttir Ser Trp 
270 275 200 

age gcic afcc cag ate ggc ate tac gcc aag cae ctg eac tgg ctg 

&e3r Ala lie QXn Il< <9ly lie Tyr Ala Zjys Mis Lbu Glu His O^rp li^u 
2eS 290 29B 

ego cac ttc ccc ate cgc cag atg etc ttc gt9 age ggc gag egg cto 1743 
Arg Bii^ Fhe Pro 11^ Arg C^ln tfB& IjBJ2 Phe Val Ser Qly 01 u Arg Leu 
300 3D5 310 315 

ate age gac eeg gee ggg gag Ct^ ggc cgc gfcg caa gae tta ctg ggcf 1751 
lie Ser Aap Pre Ala Gly Glu L«u Gly Arg Ve.1 Gin Aap Phe Leu Gly 
220 ^23 330 

etc aag agg ate ate acg gac aag cae ttc tae ttC ace aag ia3S 

lieu Lyec Arg Il« Il« 7hr A0p Lya His Phe Tyr Phe Asn Lya Thar Lys 
325 340 345 

ggc ttc ccc tgo ctg aag aag geg gag ggc Agc age egg ccc cat tgc 1SH7 
<3ly Phe Pro Cya Lau Lya Lys Ala 61u C^ly 5er Seir Arg pro Kid Cya 
350 35& 3^0 

ctg gge aag aoo ^^ag 9gc <»gg acc cat cct gag ate gae cgc gag gtg 1335- 
Leu Gly Ijya Thx- Lya Gly Arg "ffhr His Pro <31u llo Asp Arg Glu Val 

^70 37$ 

gtg cgc agg ctg cgc gag ttc tac egg oct ttc? a»c etc aag ttc tac 1^93 

VAI Axg Arg Leu itrg GLu Phe Tyi? Arg Pro P}i$ JkJUn Lou IryS Phe TVr 
3Sa 30& 3dD 

cag atg 51CC ggg cac gac ttt gge tgg gat gga taaecat^ta fttttaaaaag 2036 

<31n ^)et Thx aly Hi» App Phe Oly Trp Assp Qly 

400 405 
aaaasiaaaaa tcaaaatata atatattttt; tteccaateg gtagagaaga gacagtttaa 209^ 
tatttgtgct gaaaatatgt ttcagtattt ttttCSAatg* atgttaagag attgttetc?ei 21 ^€ 
etcccgcccc ^tCttMtgt atAacoaoca ecaaacaegt ggatcaaeag aaaaggra^ntn 221€ 
tttcact09t ctaaaoaett toast ttt ca gtttttattt tatgttctat 4ibacccag6c 2276 
ataa^gtata agcatcagtt gtcattaaaa g&ttteag 21L4 

<21D> B 
<21I> 40€ 
<212> PRr 
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<z^3> Homo Bapi^ne 

Meb Ala Pro Pro Gly Pro ilia S^r Alft Liou £er :il^r Sar Ala <31u Pro 
IS lo 15 

Leu Ser Ar9 Sor lie PhB Arg Lye Phe Lau Lieu Met Leu Cys Ser Leu 

20 2& 30 

Leu rhr S^r- Leu T>t V^*! Ph© T^t CyB Leu Ale. Oiu Ax>g CyB Gin Thr 
35 40 4& 

Leu Ser Gly Pro Val VeX GLy L©u S«r Gly Gly oly OXu Qlu Ala dly 
50 55 6(5 

Ala Pro Gly Gly Gly Vsl Leu Ala OXy GLy Pro Arg Glu Lau Al?^ Val 
S5 70 7& 80 

O^r^ Prtp Ala Als Ala Gin Axg Lye Arg Leu L«u Gin Leu Fro Gin nrp 
95 9Q 95 

Arg Arg Ay^ Ar^ Pro Pro Ala Fro Arg Aap A^p GLy Glu Glu Ala Ala 
100 105 llt> 

Trp Glu GLu Glu Ber Pro <31y Leu Sax Gly Gly Pro OLy Gly Ser Gly 
115 X20 125 

Ala Gly Ser Tlir- Vai Aid Glu Ala Pro Pro Gly Thr Leu ALa Lau Leu 
a;>0 135 140 

Lev) Asp Glu Gly Ser Lys Qln Leu Pro Gin Ala He lie lie GXy Val 
145 ISD 155 LfiO 

Ly^ Ly;9 Gly Gly ^r Arg Ala Leu Leu Glu Phe Leu Arg Val Hi^ Pro 
165 170 1?5 

Asp Val Jkxg Ala Val Gly Ala Glu Pro Hia Phe Phe Aap Arg Ser l?yr 
leo 1£5 X90 

Ajap Lya Qly Leu Ala Trp Tyr Arg Aap Leu Met pyo Aafg Thr Leu Aep 
195 2i>0 205 

Gly Gin He Thr Met 01 u Xys Thr Pro Ser Tyx Phe Vai Thr Axg Glu 
210 215 22(1 

Ala Pro Ala Aig He Ser Ala Wet Ser L>'a Aap Thr Lya Li<5U tie Val 
225 335 2dO 

Val Vel Arg Aep Pro Val Thr Arg Ala He Sox Aap Tyx Thr Qln "Phr 

245 250 25S 

Leu Ser Lye Arg Pro Aep He Fro Thr Phe Glu Ser Lau Thr Phe Lye 
2«& 265 270 

Aan Arg Thr Ala Gly Leu He Aap Thr Sex Txp Ser Ala He Gin He 
275 2fi0 2BB 

Gly He Tyr Ala Lya Hi« Leu Glu Hie Trp Leu Ax^ Kxs Phe Pro He 

2^0 2^5 300 

Arg Gin Met: Leu Phe V^l Sox Gly Glu Arg Leu Zle aer Asp Pro Ala 
305 31D 3i& 320 

Gly Glu Leu Gly Arg Val Gin Aap Phe Leu GLy Leu Lye Arg He He 

325 330 335 

Thr Asp Lyc; His Phe Tyr Phe Aan Lye Thr Ly^ Oly Pho Pro Cya Leu 
340 34& 350 
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Lys Lya Ala alu Gly B^z flcr Ar^ Pro His Cye 2jeu 01 y i^ya Thr hya 
3&S 3150 36* 

Oly Arg l-lix HiB Pxo GLu ll* Adp Arg aiui Val Val Atrg Arg Lea Arg 
370 ^75 380 

Glu Phe Tya?^ Aifg Pro Phe Aon Leu Lys Phft Tyr Gin Mat oly His 

365 390 39& 4Q0 

Asp Phe Oly Ti^p A;9p Cly 
405 

<3JL1>> 9 

<211> 2022 

<;212> DNA 

-:2l3r* Hoit>C^ sapiens 

<221> CDS 

<222> (331} (1500) 
<223> huinan 3-0dT^3B 

<40D> 9 

gbggccaggg cgcigagagtg caAcgtcctc ctggcccega gcgcgtcgtc gcgcjeaeg^g 6 0 

agcagaccct cgccciigcag ttaccgccgt cccgactttcr cgttccoigtt gGagc!t<:?Cftg 120 

cogggcaaca tstcaagagc cgccgcogct acagctgcscg ccgccacctg ggg<iiig»gca iSD 

SCfas<?Agcag cggcfggccgc gggcM^csgg Sggcaataas ccgagqcacc csgggogtcca 2^0 

gcgtgccggg gaac!C2<2tet<2 tgc^gctoact gcccggcggg aecc^icsgicca bgtgctgagG 300 

catgtDGctg gecgcgcccg cjgggcagogc atg ggg cag cgc ctg agt ggo gge 3&4 

N6t Gly {Sin Jkxg Leu S«r Gly Gly 
1 5 

aga test t$cf etc gat gtc cdc gga egg ^c?tc c?ta ccg cag oag eeg ccg 402 
Arg Ser Cya Leu Val Pro Cfly Arg Leu Leu Pro Gin Pro Pro Pro 
10 15 20 

ck;c ccsg ccg ccg gtg agg ^^gg^ aag etc gog ctg etc ttc gcc atg etc 4 SO 
Pro Piro Pro Pro Val ADcg Arg Lye? Leu Ala Leu Leu Phe Ala Not Lau 
S5 30 3S 40 

tgc gtc egg etc tat atg ttc ctg taq to© tgc gcc ggc tec tgc ^co 4^6 
Cys Val Trp Leu Tyr Met Phe Leu Tyr Ser Cyc Ala Gly &er Cya Ala 
45 50 S5 

geg go^ ccg ggg ctg ctg ctq Ctg ggc tct ggg tec cgc gcc gca cac 54$ 
Ala Ala Pro Gly Leu Leu Leu Leu Oly B^x Gly Ser Arg Ala Ala Kijs 
SO gs 70 

gac cog cce gqc Ctg gcc aca get cog gac g^9 «cg ccc ccc agg ctg 5^4 
A^p Paro Pro Ala Leu Ala Thr Ala Pro Asp Gly Thar Pro Pro Arg Leu 
75 BO 

.CJCg ttc ogg gcg ccg cca gcC! acc cca ctg get tea ggc ang gag atg SA2 
Pro Phft Arg Ala Pro Pro Ala thr Pro Leu Ala Ser Gly Lya glu M^t 
90 95 100 

gcc gag ggc get ^cg «gc ccg gag gag cag agt ceo gag gtg ccg gac 650 
Ala Gly Ala Ala Ser Pro Qlu Olu Gin fter Pro Qlu Vai Pro Aap 

lOS 110 XXS 120 

tec cca age eco ate tec age ttt tt« a«t ggg tct ggg age? a«g cag 72 B 
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Ser l?ro Ser Pro lie S«r S^r pi^^ Fho fear Gly Ser oly a or lya Gin 
125 130 X3S 



ctg ccg cag gcc ate stc sco ggc gt§ dft^ aag ggc gga acg cs^g gcg 
Leu Pro Qln ALa I^o IId 11b C;iy Val Lys Lye Oly Gly Thr Arg Ala 
140 i45 15Q 



786 



Ctg ctg gag ttt ctg cgc gtg csac CfCC gac gtg ogc gca gte* ggc gcc 
L«u Li^u Cllu Pho Lau Arg Val hIb Pro Aap v&l Arg >^la Val Gly Ala 
155 160 165 



B34 



gag ccc cat ttc? ttc? gal; cgd s^gc tac gac aag ggc cKC &ct tgg tac 
Olu Pro His Ph© Phc Aap Jiong Ser Tyr As^ Ijys Gly lbu Ala Trp Tyr 

170 175 aeo 



BB2 



Cf^g gac ctg atg ccc aga acc? ctg gac gg^ oag ate acc atg gag aag 
Ar9 Asp Leu Met Pro Arg Thr Leu Asp Gly Qln 2Ie Ttir Mat Glu Lyjjs 
135 190 195 200 



930 



acg CM agt t&c ttc? ^t<f acg egg gag gcc ccc gcsg <^C! **tc tog gcc 
Thr Pro £ar Ty^:^ Phe Val Thr Arg Glu AlA Pro Ala Arg Xl& fl<er Ala 
20S 310 215 



976 



atg toe <iag gac acc aag etc ate gtg gtg gtg Cfgg gac ccg gtg aec 
Mat Ssr Lye Aep Thar^ Lya Leu 11^ Val Val Val Arg AEp Pro Val Thr 
22D 325 230 



L02€ 



agg $C=c afcc teg gac tac acg cag acg ctg t^c aag egg occ gacj ate 
Arg Ala He Sbr Aep Tyr Thr Gin Thx Leu ser Lye? Axg Pre Aap He 
235 :240 245 



1074 



ccc ace ttc gag ago ttg acg ttc aaa aac agg ac?& ggc etc ate 

^ra Thr Phe Glu &er Xeu Thr Ph^ Lye A^A Arg llir Ala Gly Leu He 
250 255 260 



1122 



gac «icg tog tgg age gcc ate cag ate ggc ate tac gcc sag cac otg 
Aap Thr 5&>r Trp AX«t I lot Gin Ila Gly Xle fPyr Ala Lyr His Lq(u 

26S 270 275 990 



117i> 



gag cac tgg <:tg csgo e&c ttc CCO atcs cgc cag atg cte ttc gtg age 
Olo Kis Trp Lbu Arg Kla Fha Pro Tie Arg Qln Met Lau S^ha Val sar 
2&S 290 



1216 



gge egg etc ate age gac ccg ^co ggg gag Cbg ggc cgc gtg eaa 

Gly Glu Arg Leu Ho &ei: Asp Pro Ala Gly Glu Lau GXy Arg Val Gin 
300 305 210 



gac ttc ctg ggc etc aag a^g ate ate acg gac aag cac ttc tac btc 
Aap Phe Leu Gly Leu Ijys Arg Xl« He Thi: Aap Lye Hie Phe Pyr Phe 

320 325 



L314 



aac aag acc aag g^e ttc ccc tgc ctg aag aag gcg gag ggc age age 
Aan Lya l^r Lyifil Gly Ph« Piro Cya Leu Lya Lya Ala C^lu Gly der £er 
330 325 340 



13^2 



tsgg ccc eat tgc etg ggo aeg acc aag ggc agg acc oat cct 3a$ ate 
Arg Pr<o His Cya lau <31y Lya Thr Lya GLy Arg Thr His Pro Glu ll« 
345 d^O 355 360 



1410 



gac cgcj gag gtg gtg cgc agg ctg cgc gag tte tac egg cct ttc aao 
Aap Arg Glu Val Val Arg Arg Leu ATg (3Flu Phe Tyr Arg Prq PDc Asn 
365 270 375 



etc a<ig tte tac cag atg acc ggg <3ae gae ttt ggc tgg gat 
Leu Lye Phe Tysr Qln Met Thr Qly His Aap Phe Gly Trp Aietp 
330 3e& 390 



XSOO 



tgagcagacc ogggctatgt accttaccca ogtggctbat ctattgacag a^atbatatg 
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teit^taaaan stacagaaat ctattttata ataattbatt tttaattcat a&gcAatbaa 1620 
ttcactaagc tgcctagcca caDtettfiAef a&^i/gttagct tcataatot^ tt««catbcc 16 00 
aaagtstbta actctagtat ttcsgttfctct tcbtcacaat tgatggtget tCJt^ibttbtt 174 O 
ceiLCtcccct Bcctigtbata ttbaoaacska agaaaagcac aaottgsgat ttttgttgtt 18 DO 
acgggtattcr agcdttcagt caccgtctga gttctccagt tect^cctcc ttgtcttgtc 1860 
ttgggt<?tcc catfcccagct tccctgiiGtic ttoctg<?c!tg t^tacctcgt aggaacgctg ISSa 
^gctgCJOtCa acagggctgt attctgaagg gcaggccfcCA tgcagcagcc tccttgaaga 1950 
tgbggtgbcc cgtccsatga tgtagc c? t^ij i^agccacagc cGtagggttc! t$ 20^5 

<2io> 10 

<211> 290 

■^212 > PR-r 

<313> Hdno aapieciB 

<400> 10 

Met 01 y <3lA Ajtet Lbu Ser Gly Gly Arg Ser Cy« s^^u Asp Val Pro Gly 
15 10 15 

Arg lifiu Zifiu Pr»^ aixt Pro Pro Pro Pro Pro Pro Pro Val Arg Arg Lya 
20 25 30 

1.DU ALa Lau Leu Pha Ala Met Le« CyiSl Val Trp Leu Tyr Met ehe li*u 
35 40 45 

Tyy Ser Cya Ala Gly Ser cys Ala Ala Ala Pro Qly Leu Leu Deu Leu 
50 SB GO 

Qly Sar C^ly gar Arg Ala Ala His Aep Pro Pro Al^ L»u Ala^ Thr Ala 
65 70 75 $0 

Pro AHp aly Thr Pxo Pro Arg Leu Pro Pha Arg Ala Pro Pro Ala Thx 
B5 90 95 

Pro Lgu Ala Ser aly Lys Qlu Wet Ala 01 u Glly Ala Ala Ser Pro Olu 
100 lOS 110 

OX\x OLn JBar Pro Glu Val Pro Aap ^er Pro &or Pro lie Ser &er Ph« 
115 120 125 

Ph« aer <31y Sex dly Ser Lya Gin liCU Pro Oln Ala lie lie lie gly 
1^0 135 140 

Val l>yo Lya Gly Gly Thr Arg Ala Leu Leu C^lu Pho Lau Arg Votl His 
145 ISO 155 l€0 

Pro Aep Val Arg ALa Val Gly Ala Glu Pro His Ph* Pha Asp Arg Ser 
16^ 170 175 

Tyx Aap Lya Qly Lau AXa Trp Tyr Arg Asp Leu }kfet Pro Arg Thr Lau 
IfiO 150 

At;p CJly <31n Ila T*hr Met ol\x Lyjo Thr Pro Sax Tyr Phe Val Thr Arj 
195 20D 205 

Glu Ala Pro Ala Arg tie ^ar Ala Met Ser Lys Adp Thr Lya Lau lie 
210 215 22Q 

val Val Vai Arg Asp Pro Val thx Axg Ala lie Ser Acp Tyt Thr din 
225 230 240 
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Thx Leu £^er Xya Arg Fro Asp Il& Fro Tlir Phe dlu Ser Lea Thr Ph« 
245 250 255 

1^^: SiQti Ar9 Thr Ala Gly LfBu lie Asp Tlir S^r Tcp Ber Als He Gin 
2&D 26& 270 

He Gly He Tyar Ala Uye Hi& Leu Olu His Txp Leu Arg Hir9 fho Pro 
275 2B0 2B5 

He Arg CSln Net Leu Plie Val B^.r Oiy Qiu Arg Leu He Aop Pro 

290 295 SOO 

Ala Gly Glu Leu Gly Arg Val Gin Aep PJ\6 Lsu Gly Leu Lys Arg 
905 310 2I& 320 

11^ Thr Asp Lys Kia Plie Tyr Phe Aeii Lye Thr Lys Gly Phe Pro Cy& 
325 :i30 335 

ieu Lye Lye Al^ Olu Gly S^r Ser Arg Pro Hie eye Lexi Gly Lys Thx 
34D 345 350 

Lys Gly Arg Tl^r Kls Pro Glu II.?^ Adp Arg Glu Val Val Arg Arg Lou 
3&5 35Q 365 

Arg <Jlu Phe Tyr Arg Pro Pht Abxl Leu lye Phe Tyr Gin Met Itor Gly 
17Q 37S 360 

His Asp &ha Gly Trp Asp 
9^5 3d0 

<21Q> 11 

<212> DUA 

<213> Homo sapiens 

<321> CM 

<222> (847) . , (22X4) 
<r22a> 

<222> Predicted hi^w 3-OflT-4 hnlOVA 
<4D0^ 11 



gaggatatcc 


ogggcigagag 


aagggagggt 


CJ^gggatggg 


ctgagttgga 


gtc<?eagag9 


60 


aaaagCiggaa 


gcsgagagctt 


egtc&cccgc 


tgtcttocag 


ctoeeggtge 




120 


aggcaggcgt 




Ctot^ctoiaas 


gtactggggc 






180 


gtcGcg^gCFt 


Cccaasggag 


gggagtaaac 


gaggtggggt 


999gaacacc 


ccaagtgcgt 


240 


igcg^^9«tggg 


gggcbggggg 


gGaDgat.ctc 


egUtctCFCcg 


SBtgGCCcag 


cccCiBgogGa 


300 


cgcctccgct 


C!C!CCC!gCC!C!C 




^cgcgcgagg 


cgcaccGCGc ttccctcggc 


3€0 


<ggc!g<2e!999c 


gcgogccc^g 




GtCfCGCCCfC^ 


cgcct^bccb ctctocoggc 


420 


agaa«gtt;iig 


cagoggggaa 


ggaact.Gtgg 


gctgcaacd? 


cgcgcggcgg 


cggGggGaga 


400 


99c!tgaagca 


gaagcogcgg 


cggagacggg 


gaagcggggg 


cgGtgcagac 


ggagcaggt-g 


540 


<5CgcoggD3g 


gtocgcgcgc 


ccacc&cggt 


<*C!C<Jttgcct 


gaggctgagg 


ggggggcggt 


600 


gstggggggg 


aoactcsggac! 


tcsggoggerCiA 


gcJ^tgnsgog 


gggggccatig 




6^0 


occccGtggc 


gcagc!gr39ac! 


agcgerc!<;;ag9 


gccgggggog 


cagc^gcg&c gctteatgoa 


720 
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gcxcggsgogg ctgggcageg ^cg^gcjggcs&er cssgi?g3cggc ggcggegg^^ gcggoggcbg 7eD 

aaacoatgtc cgggcagcgo cgggggctgc c^gcc^ccgcc gccgccgcc^g c^agccggga B4D 

gcogcg atg gcc egg tgg ccc gca ccb cot ccg cct ceg cct ccsg cct BBS 
M©t Ala Arg Trp Pro ila Para Pro Pno Pro Pro Pro Pro Pro 
3. 5 10 

C!oa CCt ctg gcc gog cog cog cog ccc ggc gc<e tct get aag ggg acg 936 
.Pro Pro Iifiu JLla Al^ Pro Pro Ptco Ptto Qly Ala Ser Ala liy» Gly Pro 
15 20 2& 3P 

ccg gog ogc aag ctg qtt ttt fttg tgc 3icc ttg tec ccg tet gfcc acc 9fi4 
Pk;o ALa Arg Iiya Leu Leu Phe Met Cys Thr L«$u dor Le^u 2er val Thr 
35 40 ^5 

tao ctg tgc t3<:^ a^c etc ctg ggc ggc tog gga tec <?tg Cf«ia tfcc cct 1052 
Tyr- Leu Cys Tyr Ser Leu Leu ply aly Ser OLy Ser Leu Gin Pbe Pro 
SO 55 60 

CtJ Scg ctg cag gag teg ccg ggc gcc goe gcc gag ccc cog ccg a$'« LO&O 
Leu Ala L^u Gin Olu Ser Pro Gly Ala ALa Ala Glu IB^tO Pro Pro Sar 
65 7Q 75 

ccg cog cea eec tct ctg cstj <3<>fc aoc coc gtg cgc etc ggc $cc ccc L12S 
Pro Pro Pro Pro sex Leu Leu Pro Thr Pro val Arg Libu Gly Ala Pro 
BO 65 90 

teg e^g ccsg ccc gcg ccg ccg ccg ctg gac aac gcg age cac ggg gag 1176 
Bbv <3ln Pro Pro Ala Pi^o Pro Pro L«u A;9p Aan Ala Sar Hlg^ Qlv Glu 
95 lOD 105 ' 110 

ccg ccc gag ccc cca gag cag eoa gcc? gcc ccc ggg acc gac gge 1224 
Pro Pro Olu Pro Pro «lu Sin Pro Ala Ala Pro OJ.y Tlir Aap Gly Trp 
115 120 X2& 

ggg ctg ccsg ago ggo ggo gga ggc gcc egg gac gcc tgg Obc egg acc 1272 
Gly lieu Pro j&er Gly Gly Sly Oly Ala Arg Aap Ala rrp Leu Arg Thr 
130 1:35 140 

cog ctg gee ccc age? gag atg ^tc *cg gob cag age geg otg ccff gag 13 SO 
P>:o Leu Ala Pro Ser Glu Met Xla Thr Ala QLn Ser Ala Leu Pro Glu 
145 150 t5& 

agg gaa g<5g oag gag tec age acc acc gac gag gat etc gca ggc ogg J^^eB 
Arg €lu Ala Oln Qlu 9ex Thr ^it Aap Clu Asp Leu Ala Gly Arg 

160 165 170 

<iga gcg gcc aac ggg ago age gag agg ggc ggc gcc gta age acc ccc 1416 

Ar^ AXa Ala Asn <Sly Ser Bex Glu Arg Gly Gly Ala Val der llhr Pro 

17S 180 IBS 

gac tal; ggg gag aag aag ctg cca cag gog etc «tc ate ggg gtc aag 1^64 

Asp Tyx Gly Glu Iiyi? hy^ Xf«U Pro Qln ALa Leu lie lie Gly Val I/ya 
155 2D0 205 

aaa gga ggg acc ego gcg atg ctg gag gc?g ate ogc gtg cac eeg gac 1S13 
Lys' Gly Qly Thr Arg Ala Leu Leu Glu Ala rla Arg Val His &ro Asp 
210 215 2Z0 

gtg egg gog gtg ggc gta gag cac ttc ttc gac agg aac tao g«a 1560 

Val Ar^ Ala Val Gly VStl Glu Pro Hie Phe Phe Asp Arg Asn Tyr Glu 
225 230 235 

MS ttg gag tgg tac aga aat gtg atg ccc a»g act ttg gat ggg 1608 

Lys Gly Lau Glu Tyr Arg Aan Val Mat Pro l-ys Thr ieu Asp Gly 

240 245 250 
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caa ata aeo atg gag asg act ecA c^gt tac ttt. g^^ aca »at gag get tSS6 

<3ln II© Thr Met Glu Lyj3 "Thr Pro ser Tyr PHe Val Tlir Aan Qa\i Ala 

255 255 270 

ccc aag cgc Att cko tec atg gcc* ifcAg gac ate saa ctj atb gtg gtg 1704 

Fro Lys Ar^ lie His Seir Met Ala Lya Asp He Lya Leu Il« Vol Val 
275 26D 295 

gtg ag^ aac ccc gtg ace agg gcc ate tct gao tac acg cag aca ctg 1752 

Val Arg Aan Pro Val Thr Arg Ala Jl« S©r Aap Tyr TJir <31n Thx Leu 
290 295 30D 

tea aag aaa ccc g^ag ate ccc ace ttt ^Ag gtg ctg gcc tt<J aaa aac leOO 

Scr Lys Lya Pro Glu He Pro Thr Phe Gly Val Leu Ala Phe Lys A«n 
3P5 310 31S 

egg ^kcc cto sgg ctg ate gat get tec tgg agt gcic att oga ata ggg 1846 

•Arg rhr Leu Qly Lctu lie Aap Ala 5ftr Tup fier Ala lie Arg He Gly 
320 ^25 330 

ate tat gcg ctg eal; ctg ^aa aac tgg cfcc oag tat ttc cec etc tec IH9€ 

lie Tyr Ala Leu Hi a Leu glu Asn T:rp Leu Gin Tyr Phe Pro Leu S^r^ 

33 3 340 34& 350 

cag ate cte ttt gto agt ggt gag cga <?tc att gtg gac CCC gcc ggg 19^4 

Gin He Leu Phe V&l gear <31y Slu Arg L^eu I Lb Val Aap Pro Ala CJly 
355 360 365 

gAa atg gcc aaa gt^ c&g gat ttt eta gge etc aaa cgt gtt gtg act 

Glu Met Ala Lys Val Cln AeE> PHt L«u Cly Leu Lya Arg Val Val Thr 
270 375 3SG 

aag aag eat ttc tat ttc aac aaa acc a«ig ggg ttc crt tg<= Cta aag Z040 

Lyfj Lys HiB Phe Tyr Phe Asn Lya Thr Lya Oly Phe Pro Cya Leu Ly© 
365 99P 395 

aag caa g^a gac age agt gee ccg Ajg tgc tta gge a»9 ago aaa ggt 2060 

Lys Pro ^lu Aap a&r S«r Ala Pro Arg Cy& Leu Gly Lya Ser Xy« giy 

40& 410 

egg act cat cct age act sac cca gat gte ate Cao age ctg agg aaa 2136 

Arg Thr Kia Pro Axg He App Pro Asp Val lie Kia Arg Leu Arg Lya 

^15 420 425 430 

tt<3 tae aaa ccc ttc aac ttg atg ttt tac caa atg act ggt caa gat 21^4 

Pha Tyr Lya Pro Phe Aan Leu Met Phe Tyv Oln Het Thr Gly QXn Aap 

435 4^0 4«£ 

ttt cag tgg ga<i cag ^aa gag ggt gat aaa tgaggctaga gaggcagagg 2234 
Phe Gin Trp Clu Gin Glu Glu Gly Aap Lye 
4^0 45& 



aaggctagtc 


^ataagctaa 


ggaggetcct 


tgcobgagtc 


ettgaatacc 


ccagcttccg 


2234 


cagctteac^t 


tgctggagcg 


coaagtagat 


CtCCtCCtCG 


Ctcabgcagc 


caggattgec 


2354 


tecagrtgctg 


ttagcttagg 


caaacaggtg 


gatcccatg^ 


catccccatg 


gaggaaccag 


2414 


gcccatctgg 


gcagcagcat 


cbggttgacG 


agatg^CCac 


cagaacccac 


tgttcattct 


^2^74 


tatcttctge 


tai^ttaatat 


agcctgaaga 


ca^ag^ataa 


atagttgtca 


abgtcagaga 


2S34 


cagtgctatt 


aatgtatatg 


tgagcgacaa 


aaaaggtctg 


ctttatas99 


gtbctcactc 2&^4 


tagcttgggg 


agcccagggt 


tctagecctg 


tatctgtcat 


gggrc^cetgc 


tgtctaaacc 26S4 


tctgcttggg 


cttctcccca 


gaatgcaCFtt 


tgtggctgag 


tgctccagga 


ctcctaggga 


2714 
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gco^asatect ccctctaagg tgtttctagt ebtcbcttta sajggtc&oak cccacaac cc 2774 
ctgacttcct ccctzcccoac at^ct^tgadsr? cagaggcatg oacAttccto ac&gaaaaag 2B34 
aaaacacaca ccc^cccaca oacacacaca cagaagaaaa t^^aotgctga cacacotcga 2^94 
agccttcttt coaagagccc tctaaatggg gttgggtctcs actcttcatg agtatcctjg 2954 
gttgtgeag*^ «gcttagcat atgcccttgt gttaggatc^i sgcccscagg gct^^ctca^a 3014 
gagt^ga^ta attgtaaccg aggtcagagc tctggggttg gcagagatga gtg<j<5cata t 3074 
cfcgggggtaa aagaagaaac GctgtcGtcd tggt9^H.gg ttacatt^cc? tgaagaccat 3134 
ctctcGcaag cactgtagtt ctgagCAigt ttfctggggtrg gapfcctgtcc cctagggtcc* 2134 
ctagaagggc^ aa^^CCA^a gejgttgacaa stctgttatt asgaataatc cttagccatg 22 &4 
taatggag^« aggagoagto agcattcttc Gaatttge<2c caccaccacc tcctegggcfc 
tc7Att^tt.ctc ta^&tagaga tggcagagag tgagg^^gtg gcgagaaagc tIgac!t<^c^2ltt 337^ 
<7atca9aC;cc ag^ttatgag ggbtgggggt gagcaagggc tg^ctgcaga ASL<^cc<sC^t^ 3434 
nagwgctgct gaat>gaagtg tcccttocea tcagtttgat tcaattaaaa fe^CAtcattt 3494 
gacataaagc acttgttea^? agatctCfCSt^ ASi^^caggaat tgttetasta aaactggaaa 25 5 d 
tttgtatgftg t^gggggagt taaatctgtt cagctgtt^t taaactgbca tttctcccgc 3€14 
taaacgaMA ccgtgbtgtt ataaagc&ta atgcaacctg atta 3^56 

<:210> 12 

<211> 456 

-=2312;^ PRT 

<212> HOTVIQ 6a^ieA0 

<4Q0> 12 

Jteb Ala Arg Trp Pzo Ala Pro Pro Pro Pro Pro Fra Fro Pro P^ Pro 
IB 10 15 

L^u Uift Ala Fro Fxo Pro Pro Gly Ala Ala Lya Gly Pro Pyo HI A 

20 25 30 

Arg Uyu Lau Lau t^ha Met Cys '^r Leu &er lieu Bor Val ^hx Tyr Ijau 
25 40 45 

Cy^ Tyst^ Sor Lau Leu GLy Gly Bar (Sly Ser Iieu <31n Pha Pro Leu Ala 
5{) 

.Lau Gin alu Be-T Pxo GLy Ala Ala Ala <Slu Pro Pset? Pro Sex Pxo Pro 

65 70 75 eO 

Pro Pro Ser lieu Leu Fro Thr Pro Val Arg Leu ply Al<i Pro Ser ein 
B5 $0 9& 

Paro Pro Ala Pro Fr^ Fro Lau Ac^ Asn Ala Bex »1q Qly (>lu Pro Pro 
IDO 105 HQ 

OLu Pro Pro Glu Oln Pro Ala Al^ Pro Oly Thx Aap Gly Txp Q\y Leu 
115 12D 125 

Fro 5ey Oly Cxly 61y 61y Ala Arg Aap Ala Txp Leu Axg Ttir Pro £<eu 
13D 135 14Q 

Ala Pro 9er OLut Met lie Thr Ala Gin Ser Ala l^u Pxo Glu Arg Qlu 
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145 150 155 160 

Aid &lji <51u Bar Ser Thr Tin: Asp QLu Aap Leu Ale^ Qly jtrg Arg AI4 
165 170 

AlR Xan Gly Sfer gear <31u Axg Gly Gly Ala Vol Ser Thr Pro Aa^p Tyi: 
IflO 165 190 

C^ly <31u Lye uy^ lieu Pro ^In Ala Leu Il$ 11^ Gly Val Lya Ly^ Qly 
195 200 205 

Gly Thr Arg Ala Lieu Lou <31u Ala lie Arg v^^L Ria Pro Asp Val Arg 
2X0 215 220 

ALi:i Val Gly Val Qlu Pro His Phe Phe Asp Arg J^n Tyr Glu LyB Qly 

225 230 235 240 

Leu Glu Trp Tyr Arg A&n Val Met Pro Trhr Leu Aep giy Oln lie 

24& 250 255 

Thr Met Glu Lyer nir Pro Ser Tyr Phe Val %r Asn Glu Ala Pro Lys 
;i6a 265 270 

Arg He His ser r^et Ala Lya Aep lie Lye Leu He Val Val Val Ar^ 
275 280 28i> 

AiSin Pro Val Tlrr Ang A\a Ho dor Aap Tyr Thr Gin Thr Leu fteor Lya 
290 295 300 

lyi: Pro <51u lie Pano Tlir Phe Glu Val Lou Ala Phe Lyes Am Arg Thr 
305 310 320 

Leu Gly Leg 31« Aap Ala Ser Tip Ser Ala Ho Arg lie Gly He Tyx 
325 330 335 

Ala Leu H±u L6U Glu Aan Ticp Lieu Gin Tyr Ph« Pro Lau Ser Gin Xle 

3dP 34S 350 

Leu Phe Val aer Gly Glu Axig iJeu lie Val A0p Pro Ala Gly Glu Net 
355 360 365 

Ala lya Val Gin Asp Phe Leu Gly IjBU Lyp Arg Val Val Shr Lya Lys 
370 375 390 

Uis Phe Tyr Phe Ai^in Lya Thr Lya Gly Phe F^ci Cys LexD Lys Lya Pro 
3B3 330 3d5 4O0 

Glu Zk^p 8<&T Ser Ala Pro A«f Cya Leu Giy Lye Ser Ly» Oly Arg Thr 
405 4]10 415 

His Pi^o Arg lie Asp Fro Aep Val lis Hia Arg Leu Ar^ liyt^ Ph9 Tyr 
420 425 430 

Lys Pro Phe Asn Leu Met Phe Tyr Qiti ffct ^lir Gly Gin Asp Phe Gin 
435 440 44& 

Trp Glu gin Glu Glu Gly Aap Lya 
4S0 455 

.<21D> ^3 
<211> 2B4 
<212> PRT 
<2l3> Homo i»apiens 

<22D> 

c223> hi2man N&T-l {aa 5^^ bo BB2) 
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<40D> HI 

I>>riEi Thr Cys A.3p Arg Phe Pro Lya Leu Leu xle CILy Pro Gin Lys 

15 10 a& . 

Thr Gly Phr 'Chir Ala X.eu Tyaf Leu Ph« Leu Gly Met His Pro Aep Leu 
20 25 ^0 

Ser Ser Aan Tyr Pro Seir Sesr alu That" Phi^ Olu Glu lie Gin Ph& Phe 
35 40 45 

Am Oly tJis Abu Tyr His Lya Gly lie Jisp Ti?p ryr W«^b Glu Phe Phe 
5D 55 SO 

Pro Jl« Pro fl»r Asn Thr Tirr Ser Aap Phe Tyr Phi^ aiu l>yB Ser Ala 
65 7P 7S eo 

Act) Tyr Phe Asp E^tDsr Olu Val Ala Pro Arg A^ Ala Ala Aln 2>u Lau 

85 90 35 

Pro Z<yB Ala U/u. Val 3i«u Thar lis Leu lie Aan Pro Ala Aep Arg Ala 
100 1D5 110 

'r>'r Ser T5fp Tyr Gin Hla Gin Arg Ala Eis^e Asp Asp Pro Val Ala Leu 

12D ' L2& 

Lye Tyr Thr Phe Hia Olu Vsl Tie Thr Ala Qly B^x Asp Ala Ser Ser 
130 135 140 

Lya leu Arcf Ala Leu Oln Aan Arg Cyo L«u Val Pro Gly Trp ryr Ala 
145 1^0 15S 160 

T!hr ^li^ Olu Ar^ L&u Sox Ala Tyr His Ala Asn Oln zl^ Leu 

155 170 175 

Val Leu Asp Gly Lya Leu Leu Arg Thx QlXk Pro Ala Lys Val Met Asp 
laD IBS 1^0 

Hot V^'l Qln Lya Phe Leu Gly Val Thr Aan l^hr 7lo Hffp Tyr His Lys 
195 300 205 

oO^r Jieu Ala Phd A^p Fro Lya Lye Gly Phe 7rp Cys Oln Leu Lou C3lu 
2lO 215 22D 

C3ly -Qly Lya Thx Lye Cya Leu Gly Lye Ser Lyes qly Arsf 1//^ Tyr Pro 
225 :i30 235 240 

Glu M^t Ae^ Ldu Adp Ars Ala PhB Leu Lya Asp Tyr Tyr Arg Aisp 

245 250 255 

Hia Aan He glu L«U Ser Lye? Lau Leu Tyr Lya Met Gly Gin Thr Leu 
26D 265 270 

Pro Thr Trp Leu Arq Glu Aap Leu qln Aecn Thr Arg 
275 260 



<210> 14 
<211> 2B6 
<212> PRT 

<213> Homo eopien& 
<220> 

human (aa 59B to SB2i 

<400> 14 

Lye Thr Cyfr Aap Ar^ Lou Pro Lya Phe Leu He Val Gly Pro qin Ly» 
15 10 15 
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Thar Qiy Thr Thx Ala lie Kig Phe Ph© Leu Ser His Ala Val 

20 25 30 

Thr Set flei: Pha Pro Ser Pro Sear TJir Pho Glu Glu rl^ OLn l^he Phe 
3S 4D 45 

Aeii ser Prq Asn Tyr His iya Gly lie JUip TSt> T>a^ Met Jiap Pho Phe 
50 ^5 «0 

Pro val Pro S&r A^n Alei Bor thr Asp Phe Leu Phe Glu Lye Ser Ala 
5^ 70 75 BO 

T^r Tyr Phe Asp Ser Glu Val Val Pro Arg Arg Qly Alei Ala Leu lau 
eS 95 

Pro Arg AiA Lys He lie Thr Val i^tu Thr Asn Pro Al$i Asp Arg Ala 
IDO lOS IID 

Tyr Ser Trp Tyr 02n His Gin Arg Ala Hie OXy Asp Pro val Ala Lreu 
115 t20 12S 

Asn ryr Thx Phe Tysr OIn Val lie Ser Ala ser &©r Qln Thr Pro Leu 
130 13& 1«0 

Ala Leu Arg Ser Leu Glu A4;n Axg Cys Leu Val Pro Oly Tyr Tyr Ser 
145 150 155 16D 

Thr Hlc Leu Gin Arg Trp Leu Tht Tyr Tyr Pro Ser Oly QJlm Leu Leu 
1^5 170 175 

lie Val A&o aiy Gin Glu Leu Axg Thr Aen Pxo Ala Ala Ser K©t Glu 
IfiD IBS XdO 

Sar He Gin Ijy« Ph« Lou Qly He Thr Pxo Phe Leu Aan Tyr ^r Arg 
i&5 3100 20& 

Thr Leu Arg Phe Afisp Afisp^ A«p Lye Sly Pha Txp CV«s (Jin Gly Leu Glu 
21& 215 220 

(31 y Oly Lys Thr Arg Cy« LftU Oly Arg Ser Lya Oly Ar^ Axg Tyr Pro 
225 SSD 235 240 

Asp Nee Asp Thr (31u Ser Arg Xeu Phe L«u Thr Aap Fhe Phe Axg Asn 
24S 250 2$^ 

His Asn L$u Qlu Leu Ber Lya Xeu Leu Ser Axg Leu Gly Gin Pro Val 
260 26S 27& 

Pxo ser Txp Leu Arg <allu ^lu Leu Gin ZHb €«r 8«r Leu Gly 
^75 280 2G5 



■c;2a0;> 15 

<211> 2^1 

<212> PRT 

<213> C^enorhahditis alegans 

<223> put^tivia cr. elegana 2-OST 
<4Q0> ISi 

M«b Lya Tyr Axg Leu Leu Leu He Leu Hia Leu ILe Asp Leu lie Ser 
15 10 15 

Cya Gly Val He Pre Afftt Thr Ser Lya Lya Arg Phe Pxo Aap Ala Jle 
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(57) Abstract 

Disclosed are novel isolated nucleic acids and substancially pure protein preparations for naturally occurring and synthetic or chimeric 
heparan sulfate D-glucosaminyl 3-O-sulfo-transferases (3-OSTs). Also disclosed are uses for these genes and proteins, including uses for 
the modification and sequencing of glycosaminoglycans. 
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(57) Abstract 
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heparan sulfate D-glucosaminyl 3-0-sulfo~transferases (3-OSTs). Also disclosed are uses for these genes and proteins, including uses for 
the modification and sequencing of glycosaminoglycans. 
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5 (c) residues 42-109, 313-325, 303-332, or 1 10-367 of SEQ ID NO: 6; 

6 (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; 

7 (e) residues 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; 

8 (f) residues 396-408, 386-4150, or 207-456 of SEQ ID NO: 12; 

9 (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; 

10 (h) a sequence having at least 60% amino acid sequence similarity with at least one of 

1 1 (a)-(g) and encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 

12 sulfotransferase activity; and 

13 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 

17. An isolated nucleic acid comprising at least 1 8-20 consecutive nucleotides of a nucleotide 

2 sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, 

3 SEQ ID NO: 7, SEQ ID NO: 9, and SEQ ID NO: 1 1 . 

1 8. A host cell transformed with a nucleic acid of any one of claims 1 -7, or a descendant 

2 thereof 

19. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 bacterial cells, yeast cells, and insect cells. 

1 10. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 somatic cells, fetal cells, embryonic stem cells, zygotes, gametes, germ line cells, and transgenic 

3 animal cells. 

1 11. A host cell as in claim 8 wherein said cell is a mammalian cell. 

1 12. A host cell as in claim 1 1 wherein said cell is selected firom the group consisting of COS- 

2 7 cells, CHO, murine primary cardiac microvascular endothelial cells (CME), murine mast cell 

3 line C57.1, hxmian primary endothelial cells of umbilical vein (HUVEC), F9 embryonal 

4 carcinoma cells, rat fat pad endothelial cells (RFPEC), L cells, and cells derived from the 

5 transgenic animals of the invention. 

1 13. A substantially pure protein preparation comprising at least a functional fragment of a 3- 

2 OST protein. 

1 14. A substantially pure protein preparation as in claim 13 wherein said 3-OST protein is 

2 selected fipm the group consisting of mature murine 3-OST- 1 and mature human 3-OST- 1 . 

AMENDED SHEET (ARTICLE 19) 
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